About this Learning Path

Who is this for?

This is an advanced topic for software developers, performance engineers, and AI practitioners.

What will you learn?

Upon completion of this Learning Path, you will be able to:

  • Explain how a KleidiAI microkernel performs matrix multiplication (matmul) with quantized data
  • Identify how SME2 INT8 MOPA (matrix outer product accumulate) instructions map to matmul work
  • Trace how quantization and packing feed an SME2 matmul microkernel (using GGML Q4_0 and llama.cpp call stacks as a concrete example)
  • Perform basic hands-on checks (source inspection and optional disassembly) to confirm where SME2 instructions appear

Prerequisites

Before starting, you will need the following:

  • Basic understanding of general matrix multiplication (GEMM) and matmul operations
  • Basic understanding of quantization concepts for neural networks
  • (Optional) Access to an Arm CPU with SME2 support (Linux or Android) for hands-on verification steps
Next