Overview

This section compares the performance of baseline binaries with BOLT-optimized versions. It highlights the impact of merged profile optimizations and shared library enhancements on overall system throughput and latency.

All tests used Sysbench with the flags --time=0 --events=10000. This configuration ensures that each test completes exactly 10,000 requests per thread, delivering consistent workload runtimes across runs.

Baseline performance (without BOLT)

MetricRead-onlyWrite-onlyRead + write
Transactions/sec (TPS)1006.332113.03649.15
Queries/sec (QPS)16,101.2412,678.1812,983.09
Latency avg (ms)0.990.471.54
Latency 95th % (ms)1.040.831.79
Total time (s)9.934.7315.40

Performance comparison: merged and non-merged instrumentation

MetricRegular BOLT (read + write, system libssl)Merged BOLT (read + write + libssl)
Transactions/sec (TPS)850.32879.18
Queries/sec (QPS)17,006.3517,583.60
Latency avg (ms)1.181.14
Latency 95th % (ms)1.521.39
Total time (s)11.7611.37

Second test run:

MetricRegular BOLT (read + write, system libssl)Merged BOLT (read + write + libssl)
Transactions/sec (TPS)853.16887.14
Queries/sec (QPS)17,063.2217,742.89
Latency avg (ms)1.171.13
Latency 95th % (ms)1.391.37
Total time (s)239.9239.9

Performance across BOLT optimizations

MetricBOLT read-onlyBOLT write-onlyMerged BOLT (read + write + libssl)Merged BOLT (read + write + libcrypto)Merged BOLT (read + write + libcrypto + libssl)
Transactions/sec (TPS)1348.473170.92887.14896.58902.98
Queries/sec (QPS)21575.4519025.5217742.8917931.5718059.52
Latency avg (ms)0.740.321.131.111.11
Latency 95th % (ms)0.770.551.371.341.34
Total time (s)239.8239.72239.9239.9239.9
Note

All Sysbench and .fdata file paths, as well as taskset usage, should match the conventions in previous steps: use Sysbench from PATH (no src/), use /usr/share/sysbench/ for Lua scripts, and use $HOME-based paths for all .fdata and library files. On an 8-core system, use taskset -c 7 for Sysbench and avoid contention with mysqld.

Key metrics to analyze

  • TPS (transactions per second) – higher is better
  • QPS (queries per second) – higher is better
  • Latency (average and 95th percentile) – lower is better

Conclusion

  • BOLT-optimized binaries clearly outperform baseline versions by improving instruction cache usage and shortening execution paths.
  • Merging feature-specific profiles does not negatively affect performance. Instead, they allow better tuning for varied real-world workloads by capturing a broader set of runtime behaviors.
  • External library optimizations (for example, libssl and libcrypto) provide smaller incremental gains, delivering a fully-optimized execution environment.
Back
Next