Introduction
AFM-4.5B deployment on Google Cloud Axion with Llama.cpp
Provision a Google Cloud Axion Arm64 environment
Configure your Google Cloud Axion Arm64 environment
Build Llama.cpp on Google Cloud Axion Arm64
Install Python dependencies for Llama.cpp
Download and optimize the AFM-4.5B model for Llama.cpp
Run inference with AFM-4.5B using Llama.cpp
Benchmark and evaluate AFM-4.5B quantized models on Axion
Review your AFM-4.5B deployment on Axion
Next Steps
| Skill level: | Introductory |
| Reading time: | 30 min |
| Last updated: | 21 Aug 2025 |
| Skill level: |
| Introductory |
| Reading time: |
| 30 min |
| Last updated: |
| 21 Aug 2025 |
| Author: | Julien Simon |
| Arm IP: | |
| Tags: |
| Author: |
|
| Arm IP: |
| Tags: |
This Learning Path is for developers and ML engineers who want to deploy Arcee's AFM-4.5B small language model on Google Cloud Axion instances using Llama.cpp.
Upon completion of this Learning Path, you will be able to:
Before starting, you will need the following:
c4a-standard-16 or larger) instances