AFM-4.5B deployment on Google Cloud Axion with Llama.cpp

Deploy Arcee AFM-4.5B on Arm-based Google Cloud Axion with Llama.cpp

Log an issue

Fork and edit

Discuss on Discord

Deploy Arcee AFM-4.5B on Arm-based Google Cloud Axion with Llama.cpp

AFM-4.5B model and deployment workflow

AFM-4.5B is a 4.5-billion-parameter foundation model designed to balance accuracy, efficiency, and broad language coverage. Trained on nearly 8 trillion tokens of carefully filtered data, it performs well across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish.

In this Learning Path, you’ll deploy AFM-4.5B using Llama.cpp on a Google Cloud Axion Arm64 instance. You’ll walk through the full workflow, from setting up your environment and compiling the runtime, to downloading, quantizing, and running inference on the model. You’ll also evaluate model quality using perplexity, a standard metric for how well a language model predicts text.

This hands-on guide helps developers build cost-efficient, high-performance LLM applications on modern Arm server infrastructure using open-source tools and real-world deployment practices.

Deployment workflow for AFM-4.5B on Google Cloud Axion

Provision compute: launch a Google Cloud instance using an Axion-based instance type (for example, c4a-standard-16)
Set up your environment: install build tools and dependencies (CMake, Python, Git)
Build the inference engine: clone the Llama.cpp repository and compile the project for your Arm-based environment
Prepare the model: download the AFM-4.5B model files from Hugging Face and use Llama.cpp’s quantization tools to reduce model size and optimize performance
Run inference: load the quantized model and run sample prompts using Llama.cpp
Evaluate model quality: calculate perplexity or use other metrics to assess performance

Note

You can reuse this deployment flow with other models supported by Llama.cpp by swapping out the model file and adjusting quantization settings.

Back

Deploy Arcee AFM-4.5B on Arm-based Google Cloud Axion with Llama.cpp

Introduction

AFM-4.5B deployment on Google Cloud Axion with Llama.cpp

Provision a Google Cloud Axion Arm64 environment

Configure your Google Cloud Axion Arm64 environment

Build Llama.cpp on Google Cloud Axion Arm64

Install Python dependencies for Llama.cpp

Download and optimize the AFM-4.5B model for Llama.cpp

Run inference with AFM-4.5B using Llama.cpp

Benchmark and evaluate AFM-4.5B quantized models on Axion

Review your AFM-4.5B deployment on Axion

Next Steps

Deploy Arcee AFM-4.5B on Arm-based Google Cloud Axion with Llama.cpp

AFM-4.5B model and deployment workflow

Deployment workflow for AFM-4.5B on Google Cloud Axion