Introduction
Getting started with Apache Spark on Google Axion C4A (Arm Neoverse-V2)
How to create a Google Axion C4A Arm virtual machine on GCP
How to deploy Apache Spark on Google Axion C4A Arm virtual machines
Apache Spark baseline testing on Google Axion C4A Arm VM
Apache Spark performance benchmarks on Arm64 and x86_64 in Google Cloud
Next Steps
With Apache Spark installed successfully on your GCP C4A Arm-based virtual machine, you can now perform simple baseline testing to validate that Spark runs correctly and produces the expected output.
Use a text editor of your choice to create a simple Spark job file:
nano ~/spark_baseline_test.scala
Add the following code to spark_baseline_test.scala
:
val data = Seq(1, 2, 3, 4, 5)
val distData = spark.sparkContext.parallelize(data)
// Basic transformation and action
val squared = distData.map(x => x * x).collect()
println("Squared values: " + squared.mkString(", "))
This Scala example shows how to create an RDD (Resilient Distributed Dataset), apply a transformation, and collect results.
Here’s a step-by-step breakdown of the code:
val data = Seq(1, 2, 3, 4, 5)
: Creates a local Scala sequence of integersval distData = spark.sparkContext.parallelize(data)
: Converts the local sequence into a distributed RDD, so Spark can process it in parallel across CPU cores or cluster nodesval squared = distData.map(x => x * x).collect()
: Squares each element using map
, then gathers results back to the driver program with collect
println("Squared values: " + squared.mkString(", "))
: Prints the squared values as a comma-separated listRun the test file in the interactive Spark shell:
spark-shell < ~/spark_baseline_test.scala
Alternatively, you can start the spark shell and then load the file from inside the shell:
spark-shell
:load spark_baseline_test.scala
You should see output similar to:
Squared values: 1, 4, 9, 16, 25
This confirms that Spark is running correctly in local mode with its driver, executor, and cluster manager.