Run distributed inference with llama.cpp on Arm-based AWS Graviton4 instances

About this Learning Path

Who is this for?

This introductory topic is for developers with some experience using llama.cpp who want to learn how to run distributed inference on Arm-based servers.

What will you learn?

Upon completion of this Learning Path, you will be able to:

  • Set up a main host and worker nodes with llama.cpp
  • Run a large quantized model (for example, Llama 3.1 405B) with distributed CPU inference on Arm machines

Prerequisites

Before starting, you will need the following:

Next