Introduction
AFM-4.5B deployment on Google Cloud Axion with Llama.cpp
Provision a Google Cloud Axion Arm64 environment
Configure your Google Cloud Axion Arm64 environment
Build Llama.cpp on Google Cloud Axion Arm64
Install Python dependencies for Llama.cpp
Download and optimize the AFM-4.5B model for Llama.cpp
Run inference with AFM-4.5B using Llama.cpp
Benchmark and evaluate AFM-4.5B quantized models on Axion
Review your AFM-4.5B deployment on Axion
Next Steps
Skill level: | Introductory |
Reading time: | 30 min |
Last updated: | 21 Aug 2025 |
Skill level: |
Introductory |
Reading time: |
30 min |
Last updated: |
21 Aug 2025 |
Author: | Julien Simon |
Arm IP: | |
Tags: |
Author: |
|
Arm IP: |
Tags: |
This Learning Path is for developers and ML engineers who want to deploy Arcee's AFM-4.5B small language model on Google Cloud Axion instances using Llama.cpp.
Upon completion of this Learning Path, you will be able to:
Before starting, you will need the following:
c4a-standard-16
or larger) instances