About this Learning Path

Who is this for?

This is an advanced topic for software developers, performance engineers, and AI practitioners who want to optimize llama.cpp performance on Arm-based CPUs.

What will you learn?

Upon completion of this Learning Path, you will be able to:

  • Profile llama.cpp architecture and identify the role of the Prefill and Decode stages
  • Integrate Streamline Annotations into llama.cpp for fine-grained performance insights
  • Capture and interpret profiling data with Streamline
  • Analyze specific operators during token generation using Annotation Channels
  • Evaluate multi-core and multi-thread execution of llama.cpp on Arm CPUs

Prerequisites

Before starting, you will need the following:

  • Basic understanding of llama.cpp
  • Understanding of transformer models
  • Knowledge of Arm Streamline usage
  • An Arm Neoverse or Cortex-A hardware platform running Linux or Android
Next