About this Learning Path

Who is this for?

This is an introductory topic for ML engineers optimizing LLM inference performance on Arm CPUs.

What will you learn?

Upon completion of this Learning Path, you will be able to:

  • Understand how PyTorch uses multiple threads for CPU inference
  • Measure the performance impact of thread count on LLM inference
  • Tune thread count to optimize inference for specific models and systems

Prerequisites

Before starting, you will need the following:

  • An Arm-based cloud instance or an Arm server with at least 16 cores
  • Basic understanding of Python and PyTorch
  • Ability to install Docker on your Arm system
Next