| Skill level: | Introductory |
| Reading time: | 1 hr |
| Last updated: | 29 May 2026 |
| Skill level: |
| Introductory |
| Reading time: |
| 1 hr |
| Last updated: |
| 29 May 2026 |
This is an introductory topic for developers interested in running inference on quantized models. In this Learning Path, you'll learn how to run inference on Llama 3.1-8B and Whisper with and without quantization. You'll then benchmark Llama performance and accuracy with vLLM's bench CLI and the LM Evaluation Harness.
Upon completion of this Learning Path, you will be able to:
Before starting, you will need the following: