Introduction
Set up your environment
Cross-Compile ExecuTorch for the AArch64 platform
Accelerate ExecuTorch operators with KleidiAI micro-kernels
Create and quantize linear layer benchmark model
Create and quantize convolution layer benchmark model
Create matrix multiply layer benchmark model
Run model and generate the ETDump
Analyze ETRecord and ETDump
Next Steps
| Skill level: | Advanced |
| Reading time: | 30 min |
| Last updated: | 25 Nov 2025 |
| Skill level: |
| Advanced |
| Reading time: |
| 30 min |
| Last updated: |
| 25 Nov 2025 |
This is an advanced topic for developers, performance engineers, and ML framework contributors who want to benchmark and optimize KleidiAI micro-kernels within ExecuTorch to accelerate model inference on Arm64 platforms supporting SME/SME2 instructions.
Upon completion of this Learning Path, you will be able to:
Before starting, you will need the following: