Build and Run vLLM on Arm Servers

Log an issue

Fork and edit

Discuss on Discord

About this Learning Path

Skill level:

Introductory

Reading time:

45 min

Last updated:

02 Jun 2025

Author:	Jason Andrews, Arm
Arm IP:	Neoverse
Tags:	ML Linux vLLM LLM GenAI Python Hugging Face

Author:

Arm IP:

Tags:

Linux

vLLM

LLM

GenAI

Python

Hugging Face

This is an introductory topic for software developers and AI engineers interested in learning how to use the vLLM library on Arm servers.

Upon completion of this Learning Path, you will be able to:

Build vLLM from source on an Arm server.
Download a Qwen LLM from Hugging Face.
Run local batch inference using vLLM.
Create and interact with an OpenAI-compatible server provided by vLLM on your Arm server.

Before starting, you will need the following:

An Arm-based instance from a cloud service provider, or a local Arm Linux computer with at least 8 CPUs and 16 GB RAM.