About this Learning Path

Skill level:

Advanced

Reading time:

40 min

Last updated:

06 Mar 2026

Author:	Zenon Zhilong Xiu, Arm
Arm IP:	Arm C1
Tags:	ML Android Linux C++ KleidiAI llama.cpp SME2

Author:

Arm IP:

Tags:

Android

Linux

C++

KleidiAI

llama.cpp

SME2

This is an advanced topic for software developers, performance engineers, and AI practitioners.

Upon completion of this Learning Path, you will be able to:

Explain how a KleidiAI microkernel performs matrix multiplication (matmul) with quantized data
Identify how SME2 INT8 MOPA (matrix outer product accumulate) instructions map to matmul work
Trace how quantization and packing feed an SME2 matmul microkernel (using GGML Q4_0 and llama.cpp call stacks as a concrete example)
Perform basic hands-on checks (source inspection and optional disassembly) to confirm where SME2 instructions appear

Before starting, you will need the following:

Basic understanding of general matrix multiplication (GEMM) and matmul operations
Basic understanding of quantization concepts for neural networks
(Optional) Access to an Arm CPU with SME2 support (Linux or Android) for hands-on verification steps