About this Learning Path

Skill level:	Introductory
Reading time:	1 hr
Last updated:	02 May 2025

Skill level:

Introductory

Reading time:

1 hr

Last updated:

02 May 2025

Author:	Zach Lasiuk
Arm IP:	Cortex-A Neoverse
Tags:	ML Linux C++ GenAI Coding NEON Runbook

Author:

Zach Lasiuk

Arm IP:

Cortex-A Neoverse

Tags:

Linux

C++

GenAI

Coding

NEON

Runbook

Who is this for?

This is an introductory topic for developers who want to learn how to use KleidiAI to accelerate the execution of Generative AI workloads on hardware.

What will you learn?

Upon completion of this Learning Path, you will be able to:

Describe how basic math operations power Large Language Models.
Describe how the KleidiAI micro-kernels speed up Generative AI inference performance.
Run a basic C++ matrix multiplication example to showcase the speedup that KleidiAI micro-kernels can deliver.

Prerequisites

Before starting, you will need the following:

An Arm-based Linux machine that implements the Int8 Matrix Multiplication (i8mm) architecture feature. The example in this Learning Path is run on an AWS Graviton 3 instance. Instructions on setting up an Arm-based server are found here .
A basic understanding of linear algebra terminology, such as dot product and matrix multiplication.

Accelerate Generative AI workloads using KleidiAI

Introduction

KleidiAI and matrix multiplication

KleidiAI in a real software stack

Quantizing and packing micro-kernels

Next Steps

Accelerate Generative AI workloads using KleidiAI

About this Learning Path

Who is this for?

What will you learn?

Prerequisites