Record your results

The following results were collected on a Standard_D4ps_v5 Azure Cobalt virtual machine (VM).

Workload scaling (single thread)

CommandSizeThreadsThroughputRuntime
--size=1 --nProc=1110.598422 tasks/s2m 25s
--size=5 --nProc=1510.370434 tasks/s19m 34s
--size=8 --nProc=1810.401196 tasks/s28m 55s

Thread scaling (fixed workload size)

CommandSizeThreadsThroughputRuntime
--size=80 --nProc=18010.372445 tasks/s5h 11m
--size=80 --nProc=28020.775048 tasks/s2h 30m
--size=80 --nProc=48041.55115 tasks/s1h 15m

Understand workload scaling

When you increase the workload size while keeping the thread count fixed, the runtime increases significantly with the size increase, and throughput remains relatively stable.

For example:

  • --size=1 completes in ~2 minutes 25 seconds
  • --size=8 completes in ~28 minutes 55 seconds

The increase in runtime shows that the benchmark is scaling the amount of work, not changing execution efficiency.

Understand thread scaling

When increasing the number of worker processes, runtime decreases significantly
and throughput increases almost linearly.

From the results:

  • With 1 to 2 workers, runtime drops from ~5h 11m to ~2h 30m, and throughput nearly doubles.
  • With 2 to 4 workers, runtime drops again to ~1h 15m, and throughput doubles again.

The decrease in runtime and increase in throughput indicates near-linear scaling on this system.

Calculate speedup

Speedup compares performance relative to a single thread.

ThreadsRuntimeSpeedup
15h 11m1.0×
22h 30m~2.08×
41h 15m~4.16×

This shows slightly better speedup than linear scaling, which can occur due to improved cache utilization or measurement variability.

Key observations

From these results:

  • QuantLib scales well across multiple cores on Azure Cobalt
  • Throughput increases proportionally with thread count
  • Runtime grows with workload size, as expected
  • The system shows efficient usage of available cores
Note

Large benchmark sizes such as --size=80 can take several hours to complete on smaller virtual machines. For most use cases, smaller sizes such as 1, 5, or 8 are sufficient to demonstrate scaling behavior.

What you’ve accomplished

You’ve now analyzed benchmark results after building QuantLib from source on an Arm-based Azure Cobalt VM and running controlled tests. You’ve also recorded enough context to compare runs later: VM size, workload size, worker count, runtime, and throughput.

Next, you can use this workflow as a starting point for evaluating other C++ financial computing workloads on Arm cloud instances. For deeper comparisons, repeat the same benchmark process across VM sizes, compiler options, QuantLib versions, or cloud regions. Keep the command lines and environment details with the results.

Back
Next