Introduction
Run multimodal inference with MNN on Armv9
Build MNN and prepare an Omni model on Armv9
Validate text-only inference with an Omni model on Armv9
Run a vision retail shelf audit with MNN Omni
Convert spoken restock notes into structured tickets with MNN Omni
Build a single-shot multimodal restock ticket with MNN Omni
Next Steps
This section introduces the software stack used throughout this Learning Path. You will use MNN (Mobile Neural Network), a lightweight inference engine, to run a prebuilt Omni multimodal model on an Armv9 Linux system using only the CPU.
By the end of this section, you’ll understand why this combination is a practical starting point for reproducible multimodal inference on Armv9. A retail restocking workflow that combines local image and audio inputs is used as the example throughout.
MNN is a lightweight inference engine designed for deployment across mobile, embedded, and edge platforms. It’s a good fit for this Learning Path for four reasons:
For this Learning Path, MNN gives you a practical way to build a reproducible multimodal inference workflow on Armv9 while keeping the software stack compact and deployment-oriented.
An Omni model combines text, image, and audio understanding in a single inference pipeline, making it useful for building compact edge applications that need to reason over more than one input type.
In this Learning Path, you use the model to:
This single-model approach keeps the workflow easier to follow than maintaining separate models for vision and speech tasks.
To keep the workflow reproducible, this Learning Path uses a deliberately narrow scope:
CPU-only execution
All inference runs on the Armv9 CPU.
Prebuilt model assets
You use a prepared MNN Omni model package instead of exporting or converting models.
No heterogeneous scheduling
This example does not use GPU, NPU, or split CPU-accelerator execution.
This scope keeps the focus on setup, validation, and multimodal application flow.
In this section, you learned:
In the next section, you’ll build MNN natively on Armv9 and prepare the model files and local assets used in the remaining examples.