About this Learning Path

Who is this for?

This is an advanced topic for data engineers, platform engineers, and developers who want to build and optimize high-performance Spark SQL workloads using native execution engines on Arm-based cloud environments.

What will you learn?

Upon completion of this Learning Path, you will be able to:

  • Install and configure Hadoop, Spark, and Hive on Azure Cobalt 100 Arm64 virtual machines
  • Build and integrate Gluten with the Velox backend for native query execution
  • Configure Spark SQL for columnar and vectorized execution
  • Generate and load TPC-DS datasets for benchmarking
  • Run Spark SQL workloads and compare performance between vanilla Spark and Gluten with Velox

Prerequisites

Before starting, you will need the following:

  • A Microsoft Azure account with access to Cobalt 100 based instances (Dpsv6)
  • Basic understanding of distributed systems and Apache Spark
Next