Before setting up your environment, it helps to understand how ExecuTorch processes a model and runs it on Arm-based hardware. ExecuTorch uses ahead-of-time (AOT) compilation to transform PyTorch models into optimized operator graphs that run efficiently on resource-constrained systems. The workflow supports hybrid execution across CPU and NPU cores, allowing you to profile, debug, and deploy TinyML workloads with low runtime overhead and high portability across Arm microcontrollers.
ExecuTorch works in three main steps:
Step 1: Export the model
Step 2: Compile with the AOT compiler
--delegate
to move eligible operations to the Ethos-U accelerator.pte
fileStep 3: Deploy and run
For more detail, see the ExecuTorch documentation .
The diagram below summarizes the ExecuTorch workflow from model export to deployment. It shows how a trained PyTorch model is transformed into an optimized, quantized format and deployed to a target system such as an Arm Fixed Virtual Platform (FVP).
.pte
file ready for deployment.This three-step workflow ensures your TinyML models are performance-tuned and hardware-aware before deployment—even without access to physical silicon.
The three-step ExecuTorch workflow from model export to deployment
Now that you understand how ExecuTorch works, you’re ready to set up your environment and install the tools.