About this Learning Path

Skill level:	Introductory
Reading time:	30 min
Last updated:	21 Aug 2025

Skill level:

Introductory

Reading time:

30 min

Last updated:

21 Aug 2025

Author:	Julien Simon
Arm IP:
Tags:	ML Google Cloud Linux Google Cloud Hugging Face Python Llama.cpp

Author:

Julien Simon

Arm IP:

Tags:

Google Cloud

Linux

Google Cloud

Hugging Face

Python

Llama.cpp

Who is this for?

This Learning Path is for developers and ML engineers who want to deploy Arcee's AFM-4.5B small language model on Google Cloud Axion instances using Llama.cpp.

What will you learn?

Upon completion of this Learning Path, you will be able to:

Launch an Arm-based Compute Engine instance on Google Cloud Axion
Build and install Llama.cpp from source
Download and quantize the AFM-4.5B model from Hugging Face
Run inference on the quantized model using Llama.cpp
Evaluate model quality by measuring perplexity

Prerequisites

Before starting, you will need the following:

A Google Cloud account with permission to launch Axion (c4a-standard-16 or larger) instances
Basic familiarity with Linux and SSH

Deploy Arcee AFM-4.5B on Arm-based Google Cloud Axion with Llama.cpp

Introduction

AFM-4.5B deployment on Google Cloud Axion with Llama.cpp

Provision a Google Cloud Axion Arm64 environment

Configure your Google Cloud Axion Arm64 environment

Build Llama.cpp on Google Cloud Axion Arm64

Install Python dependencies for Llama.cpp

Download and optimize the AFM-4.5B model for Llama.cpp

Run inference with AFM-4.5B using Llama.cpp

Benchmark and evaluate AFM-4.5B quantized models on Axion

Review your AFM-4.5B deployment on Axion

Next Steps

Deploy Arcee AFM-4.5B on Arm-based Google Cloud Axion with Llama.cpp

About this Learning Path

Who is this for?

What will you learn?

Prerequisites