Who is this for?
This is an introductory topic for software developers and AI engineers interested in learning how to use a vLLM (Virtual Large Language Model) on Arm servers.
What will you learn?
Upon completion of this learning path, you will be able to:
- Build a vLLM from source on an Arm server.
- Download a Qwen LLM from Hugging Face.
- Run local batch inference using a vLLM.
- Create and interact with an OpenAI-compatible server provided by a vLLM on your Arm server.
Prerequisites
Before starting, you will need the following:
- An
Arm-based instance
from a cloud service provider, or a local Arm Linux computer with at least 8 CPUs and 16 GB RAM.