How to run Apache Spark benchmarks on Arm64 in GCP

Apache Spark includes internal micro-benchmarks to evaluate the performance of core components like SQL execution, aggregation, joins, and data source reads. These benchmarks are helpful for comparing performance on x86_64 vs Arm64 platforms.

Follow the steps outlined to run Spark’s built-in SQL benchmarks using the SBT-based framework.

  1. Clone the Apache Spark source code
    

        
        
git clone https://github.com/apache/spark.git

    

This clones the full Spark source code including internal test suites and the benchmarking tools.

  1. Checkout the desired Spark version
    

        
        
cd spark/ && git checkout v4.0.0

    

Switch to the stable Spark 4.0.0 release, which supports the latest internal benchmarking APIs.

  1. Build Spark with benchmarking profile enabled
    

        
        
./build/sbt -Pbenchmarks clean package

    

This compiles Spark and its dependencies, enabling the benchmarks build profile for performance testing.

  1. Run a built-in benchmark suite
    

        
        
./build/sbt -Pbenchmarks "sql/test:runMain org.apache.spark.sql.execution.benchmark.AggregateBenchmark"

    

This executes the AggregateBenchmark, which compares performance of SQL aggregation operations (e.g., SUM, STDDEV) with and without WholeStageCodegen. WholeStageCodegen is an optimization technique used by Spark SQL to improve the performance of query execution by generating Java bytecode for entire query stages instead of interpreting them step-by-step.

Example Apache Spark benchmark output (Arm64)

You should see output similar to:

    

        
        [info] Running benchmark: agg w/o group
[info]   Running case: agg w/o group wholestage off
[info]   Stopped after 2 iterations, 66883 ms
[info]   Running case: agg w/o group wholestage on
[info]   Stopped after 5 iterations, 4283 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:36:00.495 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] agg w/o group:                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] agg w/o group wholestage off                      32967          33442         672         63.6          15.7       1.0X
[info] agg w/o group wholestage on                         856            857           1       2451.2           0.4      38.5X
[info] Running benchmark: stddev
[info]   Running case: stddev wholestage off
[info]   Stopped after 2 iterations, 7538 ms
[info]   Running case: stddev wholestage on
[info]   Stopped after 5 iterations, 4357 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:36:18.982 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] stddev:                                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] stddev wholestage off                              3765           3769           5         27.8          35.9       1.0X
[info] stddev wholestage on                                870            872           2        120.6           8.3       4.3X
[info] Running benchmark: kurtosis
[info]   Running case: kurtosis wholestage off
[info]   Stopped after 2 iterations, 38309 ms
[info]   Running case: kurtosis wholestage on
[info]   Stopped after 5 iterations, 4729 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:37:24.198 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] kurtosis:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] kurtosis wholestage off                           19114          19155          58          5.5         182.3       1.0X
[info] kurtosis wholestage on                              943            946           3        111.2           9.0      20.3X
[info] Running benchmark: Aggregate w keys
[info]   Running case: codegen = F
[info]   Stopped after 2 iterations, 11018 ms
[info]   Running case: codegen = T, hashmap = F
[info]   Stopped after 3 iterations, 9331 ms
[info]   Running case: codegen = T, row-based hashmap = T
[info]   Stopped after 5 iterations, 5086 ms
[info]   Running case: codegen = T, vectorized hashmap = T
[info]   Stopped after 5 iterations, 3553 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:38:06.612 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] Aggregate w keys:                         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------------------------------            -
[info] codegen = F                                        5401           5509         154         15.5          64.4       1.0X
[info] codegen = T, hashmap = F                           3103           3110           7         27.0          37.0       1.7X
[info] codegen = T, row-based hashmap = T                 1004           1017          11         83.5          12.0       5.4X
[info] codegen = T, vectorized hashmap = T                 707            711           3        118.7           8.4       7.6X
[info] Running benchmark: Aggregate w keys
[info]   Running case: codegen = F
[info]   Stopped after 2 iterations, 10796 ms
[info]   Running case: codegen = T, hashmap = F
[info]   Stopped after 3 iterations, 8988 ms
[info]   Running case: codegen = T, row-based hashmap = T
[info]   Stopped after 5 iterations, 6483 ms
[info]   Running case: codegen = T, vectorized hashmap = T
[info]   Stopped after 5 iterations, 4909 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:38:51.375 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] Aggregate w keys:                         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen = F                                        5374           5398          34         15.6          64.1       1.0X
[info] codegen = T, hashmap = F                           2918           2996          68         28.7          34.8       1.8X
[info] codegen = T, row-based hashmap = T                 1289           1297           8         65.1          15.4       4.2X
[info] codegen = T, vectorized hashmap = T                 978            982           4         85.8          11.7       5.5X
[info] Running benchmark: Aggregate w string key
[info]   Running case: codegen = F
[info]   Stopped after 2 iterations, 3882 ms
[info]   Running case: codegen = T, hashmap = F
[info]   Stopped after 3 iterations, 3624 ms
[info]   Running case: codegen = T, row-based hashmap = T
[info]   Stopped after 5 iterations, 4145 ms
[info]   Running case: codegen = T, vectorized hashmap = T
[info]   Stopped after 5 iterations, 3779 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:39:18.280 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] Aggregate w string key:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------------------------------            -
[info] codegen = F                                        1938           1941           4         10.8          92.4       1.0X
[info] codegen = T, hashmap = F                           1208           1208           0         17.4          57.6       1.6X
[info] codegen = T, row-based hashmap = T                  820            829           5         25.6          39.1       2.4X
[info] codegen = T, vectorized hashmap = T                 756            756           0         27.8          36.0       2.6X
[info] Running benchmark: Aggregate w decimal key
[info]   Running case: codegen = F
[info]   Stopped after 2 iterations, 3771 ms
[info]   Running case: codegen = T, hashmap = F
[info]   Stopped after 2 iterations, 2231 ms
[info]   Running case: codegen = T, row-based hashmap = T
[info]   Stopped after 5 iterations, 2114 ms
[info]   Running case: codegen = T, vectorized hashmap = T
[info]   Stopped after 8 iterations, 2238 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:39:39.289 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] Aggregate w decimal key:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen = F                                        1878           1886          11         11.2          89.6       1.0X
[info] codegen = T, hashmap = F                           1116           1116           0         18.8          53.2       1.7X
[info] codegen = T, row-based hashmap = T                  411            423          11         51.0          19.6       4.6X
[info] codegen = T, vectorized hashmap = T                 278            280           2         75.4          13.3       6.8X
[info] Running benchmark: Aggregate w multiple keys
[info]   Running case: codegen = F
[info]   Stopped after 2 iterations, 6554 ms
[info]   Running case: codegen = T, hashmap = F
[info]   Stopped after 2 iterations, 3608 ms
[info]   Running case: codegen = T, row-based hashmap = T
[info]   Stopped after 2 iterations, 2936 ms
[info]   Running case: codegen = T, vectorized hashmap = T
[info]   Stopped after 2 iterations, 2569 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:40:06.514 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] Aggregate w multiple keys:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen = F                                        3272           3277           8          6.4         156.0       1.0X
[info] codegen = T, hashmap = F                           1802           1804           3         11.6          85.9       1.8X
[info] codegen = T, row-based hashmap = T                 1461           1468          10         14.4          69.7       2.2X
[info] codegen = T, vectorized hashmap = T                1283           1285           3         16.4          61.2       2.6X
[info] Running benchmark: max function bytecode size
[info]   Running case: codegen = F
[info]   Stopped after 8 iterations, 2146 ms
[info]   Running case: codegen = T, hugeMethodLimit = 10000
[info]   Stopped after 14 iterations, 2072 ms
[info]   Running case: codegen = T, hugeMethodLimit = 1500
[info]   Stopped after 16 iterations, 2112 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:40:19.258 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] max function bytecode size:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen = F                                         263            268           4          2.5         401.6       1.0X
[info] codegen = T, hugeMethodLimit = 10000                143            148           8          4.6         217.4       1.8X
[info] codegen = T, hugeMethodLimit = 1500                 129            132           3          5.1         196.6       2.0X
[info] Running benchmark: cube
[info]   Running case: cube wholestage off
[info]   Stopped after 2 iterations, 3164 ms
[info]   Running case: cube wholestage on
[info]   Stopped after 5 iterations, 4215 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:40:32.879 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] cube:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] cube wholestage off                                1572           1582          14          3.3         299.9       1.0X
[info] cube wholestage on                                  841            843           2          6.2         160.4       1.9X
[info] Running benchmark: BytesToBytesMap
[info]   Running case: UnsafeRowhash
[info]   Stopped after 15 iterations, 2052 ms
[info]   Running case: murmur3 hash
[info]   Stopped after 42 iterations, 2003 ms
[info]   Running case: fast hash
[info]   Stopped after 48 iterations, 2016 ms
[info]   Running case: arrayEqual
[info]   Stopped after 29 iterations, 2064 ms
[info]   Running case: Java HashMap (Long)
[info]   Stopped after 8 iterations, 2209 ms
[info]   Running case: Java HashMap (two ints)
[info]   Stopped after 8 iterations, 2217 ms
[info]   Running case: Java HashMap (UnsafeRow)
[info]   Stopped after 4 iterations, 2039 ms
[info]   Running case: LongToUnsafeRowMap (opt=false)
[info]   Stopped after 9 iterations, 2144 ms
[info]   Running case: LongToUnsafeRowMap (opt=true)
[info]   Stopped after 26 iterations, 2005 ms
[info]   Running case: BytesToBytesMap (off Heap)
[info]   Stopped after 5 iterations, 2368 ms
[info]   Running case: BytesToBytesMap (on Heap)
[info]   Stopped after 4 iterations, 2023 ms
[info]   Running case: Aggregate HashMap
[info]   Stopped after 87 iterations, 2011 ms
[info] OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 5.14.0-570.28.1.el9_6.aarch64
[info] 05:41:23.750 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, -m, 1, model name, /proc/cpuinfo) exited with code 1:
[info] Unknown processor
[info] BytesToBytesMap:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] UnsafeRowhash                                       137            137           0        153.6           6.5       1.0X
[info] murmur3 hash                                         48             48           0        440.6           2.3       2.9X
[info] fast hash                                            42             42           0        499.2           2.0       3.3X
[info] arrayEqual                                           71             71           0        296.8           3.4       1.9X
[info] Java HashMap (Long)                                 269            276           6         78.0          12.8       0.5X
[info] Java HashMap (two ints)                             273            277           2         76.7          13.0       0.5X
[info] Java HashMap (UnsafeRow)                            507            510           3         41.4          24.2       0.3X
[info] LongToUnsafeRowMap (opt=false)                      237            238           0         88.3          11.3       0.6X
[info] LongToUnsafeRowMap (opt=true)                        76             77           1        277.1           3.6       1.8X
[info] BytesToBytesMap (off Heap)                          472            474           2         44.4          22.5       0.3X
[info] BytesToBytesMap (on Heap)                           505            506           1         41.6          24.1       0.3X
[info] Aggregate HashMap                                    23             23           0        913.0           1.1       5.9X
[success] Total time: 669 s (11:09), completed Jul 24, 2025, 5:41:24 AM

        
    

Understanding Apache Spark benchmark metrics and results

  • Best Time (ms): Fastest execution time observed (in milliseconds).
  • Avg Time (ms): Average time across all iterations.
  • Stdev (ms): Standard deviation of execution times (lower is more stable).
  • Rate (M/s): Rows processed per second in millions.
  • Per Row (ns): Average time taken per row (in nanoseconds).
  • Relative Speed comparison: baseline (1.0X) is the slower version.

Apache Spark performance benchmark results on x86_64

The following benchmark results were collected by running the same benchmark on a c3-standard-4 (4 vCPU, 2 core, 16 GB Memory) x86_64 virtual machine in GCP, running RHEL 9.

Benchmark CaseSub-Case / ConfigBest Time (ms)Avg Time (ms)Stdev (ms)Rate (M/s)Per Row (ns)Relative
agg w/o groupwholestage off3004432090289269.814.31.0X
agg w/o groupwholestage on272827397768.71.311.0X
stddevwholestage off409741122125.639.11.0X
stddevwholestage on9489544110.69.04.3X
kurtosiswholestage off216582166494.8206.51.0X
kurtosiswholestage on13271335779.012.716.3X
Aggregate w keyscodegen = F72337234111.686.21.0X
Aggregate w keyscodegen = T, hashmap = F455645702118.454.31.6X
Aggregate w keyscodegen = T, row-based hashmap = T12011205669.814.36.0X
Aggregate w keyscodegen = T, vectorized hashmap = T70271510119.68.410.3X
Aggregate w keyscodegen = F6439652411913.076.81.0X
Aggregate w keyscodegen = T, hashmap = F415641701220.249.51.5X
Aggregate w keyscodegen = T, row-based hashmap = T211321261939.725.23.0X
Aggregate w keyscodegen = T, vectorized hashmap = T13101322864.015.64.9X
Aggregate w string keycodegen = F2265226849.3108.01.0X
Aggregate w string keycodegen = T, hashmap = F192619412010.991.81.2X
Aggregate w string keycodegen = T, row-based hashmap = T12801285816.461.01.8X
Aggregate w string keycodegen = T, vectorized hashmap = T11181123718.853.32.0X
Aggregate w decimal keycodegen = F21392167409.8102.01.0X
Aggregate w decimal keycodegen = T, hashmap = F147514881814.270.31.5X
Aggregate w decimal keycodegen = T, row-based hashmap = T447451646.921.34.8X
Aggregate w decimal keycodegen = T, vectorized hashmap = T270275577.612.97.9X
Aggregate w multiple keyscodegen = F37883834655.5180.61.0X
Aggregate w multiple keyscodegen = T, hashmap = F24122423168.7115.01.6X
Aggregate w multiple keyscodegen = T, row-based hashmap = T18901895611.190.12.0X
Aggregate w multiple keyscodegen = T, vectorized hashmap = T173917663812.182.92.2X
max func bytecode sizecodegen = F315338242.1480.71.0X
max func bytecode sizecodegen = T, hugeMethodLimit = 10000178200133.7272.31.8X
max func bytecode sizecodegen = T, hugeMethodLimit = 1500174188223.8264.81.8X
cubewholestage off1864186752.8355.51.0X
cubewholestage on10601075164.9202.21.8X
BytesToBytesMapUnsafeRowhash2042040103.09.71.0X
BytesToBytesMapmurmur3 hash69690304.13.33.0X
BytesToBytesMapfast hash41421517.41.95.0X
BytesToBytesMaparrayEqual1421420148.06.81.4X
BytesToBytesMapJava HashMap (Long)65725323.63.13.1X
BytesToBytesMapJava HashMap (two ints)89932235.44.22.3X
BytesToBytesMapJava HashMap (UnsafeRow)544546238.526.00.4X
BytesToBytesMapLongToUnsafeRowMap (opt=false)352355159.516.80.6X
BytesToBytesMapLongToUnsafeRowMap (opt=true)74751284.63.52.8X
BytesToBytesMapBytesToBytesMap (off Heap)623628733.729.70.3X
BytesToBytesMapBytesToBytesMap (on Heap)624627333.629.80.3X
BytesToBytesMapAggregate HashMap31310680.71.56.6X

Apache Spark performance benchmark results on Arm64

Results from the earlier run on the c4a-standard-4 (4 vCPU, 16 GB memory) Arm64 VM in GCP (RHEL 9):

Benchmark CaseSub-Case / ConfigBest Time (ms)Avg Time (ms)Stdev (ms)Rate (M/s)Per Row (ns)Relative
agg w/o groupwholestage off329673344267263.615.71.0X
agg w/o groupwholestage on85685712451.20.438.5X
stddevwholestage off37653769527.835.91.0X
stddevwholestage on8708722120.68.34.3X
kurtosiswholestage off1911419155585.5182.31.0X
kurtosiswholestage on9439463111.29.020.3X
Aggregate w/ keyscodegen = F5401550915415.564.41.0X
Aggregate w/ keyscodegen = T, hashmap = F31033110727.037.01.7X
Aggregate w/ keysrow-based hashmap = T100410171183.512.05.4X
Aggregate w/ keysvectorized hashmap = T7077113118.78.47.6X
Aggregate w/ string keycodegen = F19381941410.892.41.0X
Aggregate w/ string keycodegen = T, hashmap = F12081208017.457.61.6X
Aggregate w/ string keyrow-based hashmap = T820829525.639.12.4X
Aggregate w/ string keyvectorized hashmap = T756756027.836.02.6X
Aggregate w/ decimal keycodegen = F187818861111.289.61.0X
Aggregate w/ decimal keycodegen = T, hashmap = F11161116018.853.21.7X
Aggregate w/ decimal keyrow-based hashmap = T4114231151.019.64.6X
Aggregate w/ decimal keyvectorized hashmap = T278280275.413.36.8X
Aggregate w/ multiple keyscodegen = F3272327786.4156.01.0X
Aggregate w/ multiple keyscodegen = T, hashmap = F18021804311.685.91.8X
Aggregate w/ multiple keysrow-based hashmap = T146114681014.469.72.2X
Aggregate w/ multiple keysvectorized hashmap = T12831285316.461.22.6X
Max function bytecode sizecodegen = F26326842.5401.61.0X
Max function bytecode sizehugeMethodLimit = 1000014314884.6217.41.8X
Max function bytecode sizehugeMethodLimit = 150012913235.1196.62.0X
Cubewholestage off15721582143.3299.91.0X
Cubewholestage on84184326.2160.41.9X
BytesToBytesMapUnsafeRowhash1371370153.66.51.0X
BytesToBytesMapmurmur3 hash48480440.62.32.9X
BytesToBytesMapfast hash42420499.22.03.3X
BytesToBytesMapAggregate HashMap23230913.01.15.9X

Apache Spark performance benchmarking comparison on Arm64 and x86_64

When you compare the benchmarking results you will notice that on the Google Axion C4A Arm-based instances:

  • Whole-stage code generation significantly boosts performance, improving execution by up to (e.g., agg w/o group from 2728 ms to 856 ms).
  • Aggregation with Keys, across row-based and non-hashmap variants deliver ~1.7–5.4× speedups.
  • Arm-based Spark shows strong hash performance, murmur3 and UnsafeRowhash on Arm-based instances are ~3×–5× faster, with the aggregate hashmap ~6× faster; the fast hash path is roughly on par.

Overall, when whole-stage codegen and vectorized hashmap paths are used, you should see multi-fold speedups on the Google Axion C4A Arm-based instances.

Back
Next