What you've learned

You should now know how to:

  • Build a vLLM from source on an Arm server.
  • Download a Qwen LLM from Hugging Face.
  • Run local batch inference using a vLLM.
  • Create and interact with an OpenAI-compatible server provided by a vLLM on your Arm server.

Knowledge Check

What is the primary purpose of vLLM?

In addition to Python, which extra programming languages are required by the vLLM build system?

What is the VLLM_TARGET_DEVICE environment variable set to for building vLLM for Arm CPUs?


Back
Next