About this Learning Path

Who is this for?

This Learning Path is an advanced topic for developers who want to accelerate the performance of matrix multiplication using Arm's Scalable Matrix Extension Version 2 (SME2).

What will you learn?

Upon completion of this learning path, you will be able to:

  • Implement a baseline matrix multiplication kernel in C without SME2
  • Use SME2 assembly instructions to accelerate matrix multiplication performance
  • Use SME2 intrinsics to vectorize and optimize matrix multiplication
  • Compile code with SME2 intrinsics and assembly
  • Benchmark and validate SME2-accelerated matrix multiplication on Arm hardware or in a Linux-based emulation environment
  • Compare performance metrics between baseline and SME2-optimized implementations

Prerequisites

Before starting, you will need the following:

  • Working knowledge of Arm’s SVE and SME2 instruction sets
  • Intermediate proficiency with the C programming language and the Armv9-A assembly language
  • A computer running Linux, macOS, or Windows
  • Installations of Git and Docker for project setup and emulation
  • A platform that supports SME2 - see the list of devices with SME2 support or an emulator to run code with SME2 instructions
  • Compiler support for SME2 instructions (for example, LLVM 17+ with SME2 backend support)
Next