About this Learning Path

Skill level:	Advanced
Reading time:	30 min
Last updated:	23 Feb 2026

Skill level:

Advanced

Reading time:

30 min

Last updated:

23 Feb 2026

Author:	Kieran Hejmadi
Arm IP:	Neoverse
Tags:	Performance and Architecture AWS Microsoft Azure Google Cloud Oracle Linux C++ Python taskset perf Google Benchmark

Author:

Kieran Hejmadi

Arm IP:

Neoverse

Tags:

Performance and Architecture

AWS

Microsoft Azure

Google Cloud

Oracle

Linux

C++

Python

taskset

perf

Google Benchmark

Who is this for?

This is an advanced topic for developers, performance engineers, and system administrators looking to fine-tune the performance of their workload on many-core Arm-based systems.

What will you learn?

Upon completion of this Learning Path, you will be able to:

Pin threads to specific CPU cores using taskset and source code modifications
Measure cache performance improvements from thread pinning using perf
Evaluate performance trade-offs between throughput and latency consistency
Implement CPU affinity strategies for co-located workloads

Prerequisites

Before starting, you will need the following:

An Arm Linux system with four or more CPU cores
Experience with multi-threaded programming in C++ and Python
Understanding of build systems and computer architecture concepts
Familiarity with Linux command-line tools

Optimize application performance with CPU affinity

Introduction

Understand thread pinning and CPU affinity

Create a CPU-intensive program

Pin threads to cores with taskset

Set CPU affinity in source code

Next Steps

Optimize application performance with CPU affinity

About this Learning Path

Who is this for?

What will you learn?

Prerequisites