To follow the instructions for this Learning Path, you will need an Arm server running Ubuntu 24.04 LTS with at least 8 cores, 16GB of RAM, and 50GB of disk storage.
vLLM stands for Virtual Large Language Model, and is a fast and easy-to-use library for inference and model serving.
You can use vLLM in batch mode, or by running an OpenAI-compatible server.
In this Learning Path, you will learn how to build vLLM from source and run inference on an Arm-based server, highlighting its effectiveness.
First, ensure your system is up-to-date and install the required tools and libraries:
sudo apt-get update -y
sudo apt-get install -y curl ccache git wget vim numactl gcc-12 g++-12 python3 python3-pip python3-venv python-is-python3 libtcmalloc-minimal4 libnuma-dev ffmpeg libsm6 libxext6 libgl1 libssl-dev pkg-config
Set the default GCC to version 12:
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
Next, install Rust. For more information, see the Rust install guide .
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
Four environment variables are required. You can enter these at the command line or add them to your $HOME/.bashrc
file and source the file.
To add them at the command line, use the following:
export CCACHE_DIR=/home/ubuntu/.cache/ccache
export CMAKE_CXX_COMPILER_LAUNCHER=ccache
export VLLM_CPU_DISABLE_AVX512="true"
export LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libtcmalloc_minimal.so.4"
Create and activate a Python virtual environment:
python -m venv env
source env/bin/activate
Your command-line prompt is prefixed by (env)
, which indicates that you are in the Python virtual environment.
Now update Pip and install Python packages:
pip install --upgrade pip
pip install py-cpuinfo
First, clone the vLLM repository from GitHub:
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 72ff3a968682e6a3f7620ab59f2baf5e8eb2777b
The Git checkout specifies a specific hash known to work for this example.
Omit this command to use the latest code on the main branch.
Install the Python packages for vLLM:
pip install -r requirements-build.txt
pip install -v -r requirements-cpu.txt
Build vLLM using Pip:
VLLM_TARGET_DEVICE=cpu python3 setup.py bdist_wheel
pip install dist/*.whl
When the build completes, navigate out of the repository:
rm -rf dist
cd ..
You are now ready to download an LLM and run vLLM.