About this Learning Path

Skill level:	Advanced
Reading time:	1 hr
Last updated:	20 Feb 2026

Skill level:

Advanced

Reading time:

1 hr

Last updated:

20 Feb 2026

Authors:	Richard Burton, Arm Annie Tallund, Arm
Arm IP:	Mali
Tags:	ML Linux macOS Windows ExecuTorch TorchAO Vulkan TOSA NX

Authors:

Richard Burton, Arm
Annie Tallund, Arm

Arm IP:

Mali

Tags:

Linux

macOS

Windows

ExecuTorch

TorchAO

Vulkan

TOSA

Who is this for?

This is an advanced topic for ML developers who want to reduce latency and memory bandwidth by exporting INT8 models to the `.vgf` file format using the ExecuTorch Arm backend.

What will you learn?

Upon completion of this Learning Path, you will be able to:

Explain when to use post-training quantization (PTQ) vs quantization-aware training (QAT)
Prepare and quantize a PyTorch model using TorchAO PT2E quantization APIs
Export the quantized model to TOSA and generate a model artifact with the ExecuTorch Arm backend
Validate the exported graph by visualizing it using Google's Model Explorer

Prerequisites

Before starting, you will need the following:

Basic PyTorch model training and evaluation experience
A development machine with Python 3.10+ and PyTorch installed that runs ExecuTorch

Quantize neural upscaling models with ExecuTorch

Introduction

Explore PTQ and QAT for ExecuTorch INT8 deployment

Set up your environment for ExecuTorch quantization

Apply PTQ and export a quantized VGF model

Apply QAT and export a quantized VGF model

Inspect the graph with Model Explorer

Next Steps

Quantize neural upscaling models with ExecuTorch

About this Learning Path

Who is this for?

What will you learn?

Prerequisites