Introduction
AFM-4.5B deployment on Google Cloud Axion with Llama.cpp
Provision a Google Cloud Axion Arm64 environment
Configure your Google Cloud Axion Arm64 environment
Build Llama.cpp on Google Cloud Axion Arm64
Install Python dependencies for Llama.cpp
Download and optimize the AFM-4.5B model for Llama.cpp
Run inference with AFM-4.5B using Llama.cpp
Benchmark and evaluate AFM-4.5B quantized models on Axion
Review your AFM-4.5B deployment on Axion
Next Steps
In this step, you’ll create a Python virtual environment and install the dependencies required to run AFM-4.5B with Llama.cpp. This ensures a clean, isolated environment for model optimization on Google Cloud Axion.
virtualenv env-llama-cpp
This command creates a new Python virtual environment named env-llama-cpp
, which has the following benefits:
source env-llama-cpp/bin/activate
This command does the following:
env-llama-cpp
, indicating the environment is activePATH
to use so the environment’s Python interpreterpip
commands install packages into the isolated environmentBefore installing dependencies, upgrade pip:
pip install --upgrade pip
--upgrade
flag to fetch and install the newest releaseUse the following command to install all required Python packages:
pip install -r requirements.txt
This command:
-r
flag to read the list of dependencies from requirements.txt
llama.cpp
This step sets up everything you need to run AFM-4.5B in your Python environment.
After installation, your environment includes:
You can now run Python scripts that integrate with the compiled Llama.cpp binaries.