Deploy a Large Language Model (LLM) chatbot with llama.cpp using KleidiAI on Arm servers | Arm Learning Paths

Deploy a Large Language Model (LLM) chatbot with llama.cpp using KleidiAI on Arm servers

Log an issue

Fork and edit

Discuss on Discord

Deploy a Large Language Model (LLM) chatbot with llama.cpp using KleidiAI on Arm servers

About this Learning Path

Skill level:	Introductory
Reading time:	30 min
Last updated:	02 Jun 2025

Skill level:

Introductory

Reading time:

30 min

Last updated:

02 Jun 2025

Authors:	Pareena Verma, Arm Jason Andrews, Arm Zach Lasiuk
Arm IP:	Neoverse
Tags:	ML Linux LLM GenAI Python Demo Hugging Face

Authors:

Pareena Verma, Arm
Jason Andrews, Arm
Zach Lasiuk

Arm IP:

Tags:

ML

Linux

LLM

GenAI

Python

Demo

Hugging Face

Who is this for?

This is an introductory topic for developers interested in running LLMs on Arm-based servers.

What will you learn?

Upon completion of this Learning Path, you will be able to:

Download and build llama.cpp on your Arm server.
Download a pre-quantized Llama 3.1 model from Hugging Face.
Run the pre-quantized model on your Arm CPU and measure the performance.

Prerequisites

Before starting, you will need the following:

An AWS Graviton4 r8g.16xlarge instance to test Arm performance optimizations, or any Arm based instance from a cloud service provider or an on-premise Arm server.

Next