About this Learning Path

Who is this for?

This is an advanced topic for software developers who want to learn how to use the full range of features available in SVE, SVE2, and SME2 to improve software performance on Arm processors.

What will you learn?

Upon completion of this Learning Path, you will be able to:

  • Improve SIMD code performance using Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME)
  • Describe what SIMD Loops contains and how kernels are organized across scalar, NEON, SVE,SVE2, and SME2 variants
  • Build and run a selected kernel with the provided runner and validate correctness against the C reference
  • Choose the appropriate build target to compare NEON, SVE/SVE2, and SME2 implementations

Prerequisites

Before starting, you will need the following:

  • An AArch64 computer running Linux or macOS. You can use cloud instances, refer to Get started with Arm-based cloud instances for a list of cloud service providers.
  • Some familiarity with SIMD programming and NEON intrinsics.
  • Recent toolchains that support SVE/SME (GCC 13+ or Clang 16+ recommended)
Next