About this Learning Path

Skill level:

Introductory

Reading time:

45 min

Last updated:

22 Jun 2026

Author:	Jason Andrews, Arm
Arm IP:	Neoverse
Tags:	ML AWS Microsoft Azure Google Cloud Oracle Linux vLLM LLM Generative AI Python Hugging Face

Author:

Arm IP:

Tags:

AWS

Microsoft Azure

Google Cloud

Oracle

Linux

vLLM

LLM

Generative AI

Python

Hugging Face

This is an introductory topic for software developers and AI engineers interested in learning how to use the vLLM library on Arm servers.

Upon completion of this Learning Path, you will be able to:

Build vLLM from source on an Arm server.
Use a Qwen LLM from Hugging Face.
Run local batch inference using vLLM.
Create and interact with an OpenAI-compatible server provided by vLLM on your Arm server.

Before starting, you will need the following:

An Arm-based Linux instance from a cloud service provider, or a local Arm Linux computer running Ubuntu 24.04 with at least 8 CPUs, 16 GB RAM, and 50 GB of disk storage.
A system that includes support for BFloat16.