Who is this for?
This is an introductory topic for developers interested in running LLMs on Arm-based servers.
What will you learn?
Upon completion of this learning path, you will be able to:
- Download and build llama.cpp on your Arm server.
- Download a pre-quantized Llama 3.1 model from Hugging Face.
- Re-quantize the model weights to take advantage of the Arm KleidiAI kernels.
- Compare the pre-quantized Llama 3.1 model weights performance to the re-quantized weights on your Arm CPU.
Prerequisites
Before starting, you will need the following:
- An AWS Graviton3 c7g.16xlarge instance to test Arm performance optimizations, or any
Arm based instance
from a cloud service provider or an on-premise Arm server.