Deploy a Large Language Model (LLM) chatbot with llama.cpp using KleidiAI on Arm servers

About this Learning Path

Who is this for?

This is an introductory topic for developers interested in running LLMs on Arm-based servers.

What will you learn?

Upon completion of this learning path, you will be able to:

  • Download and build llama.cpp on your Arm server.
  • Download a pre-quantized Llama 3.1 model from Hugging Face.
  • Run the pre-quantized model on your Arm CPU and measure the performance.

Prerequisites

Before starting, you will need the following:

  • An AWS Graviton4 r8g.16xlarge instance to test Arm performance optimizations, or any Arm based instance from a cloud service provider or an on-premise Arm server.
Next