About this Learning Path

Skill level:	Advanced
Reading time:	30 min
Last updated:	18 Dec 2025

Skill level:

Advanced

Reading time:

30 min

Last updated:

18 Dec 2025

Authors:	Alejandro Martinez Vicente, Arm Mohamad Najem, Arm
Arm IP:	Neoverse
Tags:	Performance and Architecture Linux macOS C C++ GCC Clang

Authors:

Alejandro Martinez Vicente, Arm
Mohamad Najem, Arm

Arm IP:

Neoverse

Tags:

Performance and Architecture

Linux

macOS

C++

GCC

Clang

Who is this for?

This is an advanced topic for software developers who want to learn how to use the full range of features available in SVE, SVE2, and SME2 to improve software performance on Arm processors.

What will you learn?

Upon completion of this Learning Path, you will be able to:

Improve SIMD code performance using Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME)
Describe what SIMD Loops contains and how kernels are organized across scalar, NEON, SVE,SVE2, and SME2 variants
Build and run a selected kernel with the provided runner and validate correctness against the C reference
Choose the appropriate build target to compare NEON, SVE/SVE2, and SME2 implementations

Prerequisites

Before starting, you will need the following:

An AArch64 computer running Linux or macOS. You can use cloud instances, refer to Get started with Arm-based cloud instances for a list of cloud service providers.
Some familiarity with SIMD programming and NEON intrinsics.
Recent toolchains that support SVE/SME (GCC 13+ or Clang 16+ recommended)

Code kata: perfect your SVE and SME skills with SIMD Loops

Introduction

About Single Instruction, Multiple Data loops

Using SIMD Loops

Code example

How to learn with SIMD Loops

Next Steps

Code kata: perfect your SVE and SME skills with SIMD Loops

About this Learning Path

Who is this for?

What will you learn?

Prerequisites