About this Learning Path

Who is this for?

This is an introductory topic for developers interested in running LLMs on Arm-based servers.

What will you learn?

Upon completion of this learning path, you will be able to:

  • Download and build llama.cpp on your Arm server
  • Download a pre-quantized Llama 2 model from Hugging Face
  • Re-quantize the model weights to take advantage of Arm improvements
  • Compare the pre-quantized Llama 2 model weights performance to the re-quantized weights on your Arm CPU

Prerequisites

Before starting, you will need the following:

  • An AWS Graviton3 c7g.2xlarge instance to test Arm performance optimizations, or any Arm based instance from a cloud service provider or an on-premise Arm server.
Next