Introduction
AFM-4.5B deployment on Google Cloud Axion with Llama.cpp
Provision a Google Cloud Axion Arm64 environment
Configure your Google Cloud Axion Arm64 environment
Build Llama.cpp on Google Cloud Axion Arm64
Install Python dependencies for Llama.cpp
Download and optimize the AFM-4.5B model for Llama.cpp
Run inference with AFM-4.5B using Llama.cpp
Benchmark and evaluate AFM-4.5B quantized models on Axion
Review your AFM-4.5B deployment on Axion
Next Steps
| Skill level: | Introductory |
| Reading time: | 30 min |
| Last updated: | 20 Feb 2026 |
| Skill level: |
| Introductory |
| Reading time: |
| 30 min |
| Last updated: |
| 20 Feb 2026 |
This Learning Path is for developers and ML engineers who want to deploy Arcee's AFM-4.5B small language model on Google Cloud Axion instances using Llama.cpp.
Upon completion of this Learning Path, you will be able to:
Before starting, you will need the following:
c4a-standard-16 or larger) instances