Introduction
AFM-4.5B deployment on Google Cloud Axion with Llama.cpp
Provision a Google Cloud Axion Arm64 environment
Configure your Google Cloud Axion Arm64 environment
Build Llama.cpp on Google Cloud Axion Arm64
Install Python dependencies for Llama.cpp
Download and optimize the AFM-4.5B model for Llama.cpp
Run inference with AFM-4.5B using Llama.cpp
Benchmark and evaluate AFM-4.5B quantized models on Axion
Review your AFM-4.5B deployment on Axion
Next Steps
Before you begin, make sure you meet the following requirements:
c4a-standard-16
(or larger)If you’re new to Google Cloud, see the Learning Path Getting started with Google Cloud .
Confirm that your account has sufficient quota for Axion instances and enough storage capacity to host the AFM-4.5B model and dependencies.
In the left sidebar of the Compute Engine dashboard , select VM instances, and then Create instance.
Use the following settings:
arcee-axion-instance
c4a
instancesc4a-standard-16
or largerIn the left sidebar, select OS and storage.
Leave the other settings as they are.
When you’re ready, click Create to launch your Compute Engine instance.
After a few seconds, you should see your instance listed as Running.
If the launch fails, double-check your settings and permissions, and try again.
Open the SSH dropdown list, and select Open in browser window.
Your browser may ask you to authenticate. Once you’ve done that, a terminal window will open.
You are now connected to your Ubuntu instance running on Google Cloud Axion.