About this Learning Path

Skill level:	Advanced
Reading time:	2 hrs
Last updated:	17 Jun 2026

Skill level:

Advanced

Reading time:

2 hrs

Last updated:

17 Jun 2026

Author:	Pareena Verma, Arm
Arm IP:	Neoverse
Tags:	Performance and Architecture Microsoft Azure Linux Apache Spark Hadoop Hive Gluten Velox

Author:

Pareena Verma, Arm

Arm IP:

Neoverse

Tags:

Performance and Architecture

Microsoft Azure

Linux

Apache Spark

Hadoop

Hive

Gluten

Velox

Who is this for?

This is an advanced topic for data engineers, platform engineers, and developers who want to build and optimize high-performance Spark SQL workloads using native execution engines on Arm-based cloud environments.

What will you learn?

Upon completion of this Learning Path, you will be able to:

Install and configure Hadoop, Spark, and Hive on Azure Cobalt 100 Arm64 virtual machines
Build and integrate Gluten with the Velox backend for native query execution
Configure Spark SQL for columnar and vectorized execution
Generate and load TPC-DS datasets for benchmarking
Run Spark SQL workloads and compare performance between vanilla Spark and Gluten with Velox

Prerequisites

Before starting, you will need the following:

A Microsoft Azure account with access to Cobalt 100 based instances (Dpsv6)
Basic understanding of distributed systems and Apache Spark

Run Apache Spark SQL workloads on Azure Cobalt 100 Arm64 using Gluten and Velox for accelerated analytics

Introduction

Understand Azure Cobalt 100 and Apache Spark with Gluten and Velox

Create an Azure Cobalt 100 Arm64 virtual machine

Deploy Apache Spark SQL with Gluten and Velox on Arm64

Run TPC-DS Benchmark on Spark with Gluten and Velox on Arm64

Next Steps

Run Apache Spark SQL workloads on Azure Cobalt 100 Arm64 using Gluten and Velox for accelerated analytics

About this Learning Path

Who is this for?

What will you learn?

Prerequisites