Introduction
AFM-4.5B deployment on Google Cloud Axion with Llama.cpp
Provision a Google Cloud Axion Arm64 environment
Configure your Google Cloud Axion Arm64 environment
Build Llama.cpp on Google Cloud Axion Arm64
Install Python dependencies for Llama.cpp
Download and optimize the AFM-4.5B model for Llama.cpp
Run inference with AFM-4.5B using Llama.cpp
Benchmark and evaluate AFM-4.5B quantized models on Axion
Review your AFM-4.5B deployment on Axion
Next Steps
Congratulations! You have successfully deployed the AFM-4.5B foundation model on Google Cloud Axion Arm64.
Here’s a summary of what you built and how to extend it.
Using this Learning Path, you have:
Launched an Axion-powered Google Cloud instance – you set up a c4a
instance running Ubuntu 24.04 LTS, leveraging Arm-based compute for optimal price–performance.
Configured the development environment – you installed tools and dependencies, including Git, build tools, and Python packages for machine learning workloads.
Built Llama.cpp from source – you compiled the inference engine specifically for the Arm64 architecture to maximize performance on Axion.
Downloaded and optimized AFM-4.5B – you retrieved the 4.5-billion-parameter Arcee Foundation Model, converted it to the GGUF format, and created quantized versions (8-bit and 4-bit) to reduce memory usage and improve speed.
Ran inference and evaluation – you tested the model using interactive sessions and API endpoints, and benchmarked speed, memory usage, and model quality.
The benchmarking results demonstrate the power of quantization and Arm-based computing:
Google Cloud Axion processors, based on Arm Neoverse V2, provide:
Now that you have a working deployment, you can extend it further.
Production deployment:
Application development:
llama-server
APITogether, Arcee AI’s foundation models, Llama.cpp’s efficient runtime, and Google Cloud Axion provide a scalable, cost-efficient platform for AI.
From chatbots and content generation to research tools, this stack delivers a balance of performance, cost, and developer control.
For more information on Arcee AI, and how you can build high-quality, secure, and cost-efficient AI solutions, please visit www.arcee.ai .