About this Learning Path

Skill level:	Introductory
Reading time:	30 min
Last updated:	05 Sep 2025

Skill level:

Introductory

Reading time:

30 min

Last updated:

05 Sep 2025

Author:	Julien Simon
Arm IP:
Tags:	ML Linux AWS Hugging Face Python Llama.cpp

Author:

Julien Simon

Arm IP:

Tags:

Linux

AWS

Hugging Face

Python

Llama.cpp

Who is this for?

This Learning Path is for developers and ML engineers who want to deploy Arcee's AFM-4.5B small language model on AWS Graviton4 instances using Llama.cpp.

What will you learn?

Upon completion of this Learning Path, you will be able to:

Launch an Arm-based EC2 instance on AWS Graviton4
Build and install Llama.cpp from source
Download and quantize the AFM-4.5B model from Hugging Face
Run inference on the quantized model using Llama.cpp
Evaluate model quality by measuring perplexity

Prerequisites

Before starting, you will need the following:

An AWS account with permission to launch Graviton4 (c8g.4xlarge or larger) instances
Basic familiarity with Linux and SSH

Deploy Arcee AFM-4.5B on Arm-based AWS Graviton4 with Llama.cpp

Introduction

Overview

Provision your Graviton4 environment

Configure your Graviton4 environment

Build Llama.cpp

Install Python dependencies

Download and optimize the AFM-4.5B model

Run inference with AFM-4.5B

Benchmark and evaluate the quantized models

Review what you built

Next Steps

Deploy Arcee AFM-4.5B on Arm-based AWS Graviton4 with Llama.cpp

About this Learning Path

Who is this for?

What will you learn?

Prerequisites