About this Learning Path

Who is this for?

This is an introductory topic for developers interested in running LLMs on Arm-based servers.

What will you learn?

Upon completion of this learning path, you will be able to:

  • Download and build llama.cpp on your Arm server.
  • Download a pre-quantized Llama 3.1 model from Hugging Face.
  • Re-quantize the model weights to take advantage of the Arm KleidiAI kernels.
  • Compare the pre-quantized Llama 3.1 model weights performance to the re-quantized weights on your Arm CPU.

Prerequisites

Before starting, you will need the following:

  • An AWS Graviton3 c7g.16xlarge instance to test Arm performance optimizations, or any Arm based instance from a cloud service provider or an on-premise Arm server.
Next