Introduction
AFM-4.5B deployment on Google Cloud Axion with Llama.cpp
Provision a Google Cloud Axion Arm64 environment
Configure your Google Cloud Axion Arm64 environment
Build Llama.cpp on Google Cloud Axion Arm64
Install Python dependencies for Llama.cpp
Download and optimize the AFM-4.5B model for Llama.cpp
Run inference with AFM-4.5B using Llama.cpp
Benchmark and evaluate AFM-4.5B quantized models on Axion
Review your AFM-4.5B deployment on Axion
Next Steps
AFM-4.5B is a 4.5-billion-parameter foundation model designed to balance accuracy, efficiency, and broad language coverage. Trained on nearly 8 trillion tokens of carefully filtered data, it performs well across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish.
In this Learning Path, you’ll deploy AFM-4.5B using Llama.cpp on a Google Cloud Axion Arm64 instance. You’ll walk through the full workflow, from setting up your environment and compiling the runtime, to downloading, quantizing, and running inference on the model. You’ll also evaluate model quality using perplexity, a standard metric for how well a language model predicts text.
This hands-on guide helps developers build cost-efficient, high-performance LLM applications on modern Arm server infrastructure using open-source tools and real-world deployment practices.
c4a-standard-16
)