About this Learning Path

Skill level:	Advanced
Reading time:	1 hr 30 min
Last updated:	24 Apr 2026

Skill level:

Advanced

Reading time:

1 hr 30 min

Last updated:

24 Apr 2026

Author:	Odin Shen, Arm
Arm IP:	Cortex-A
Tags:	ML Linux CMake CPP Bash

Author:

Odin Shen, Arm

Arm IP:

Cortex-A

Tags:

Linux

CMake

CPP

Bash

Who is this for?

This Learning Path is for developers and engineers who want to run multimodal image, audio, and text models on Armv9 Linux systems using MNN as a portable, CPU-first inference runtime. It is aimed at readers who are comfortable building software from source and want a reproducible on-device workflow without quantization or heterogeneous scheduling.

What will you learn?

Upon completion of this Learning Path, you will be able to:

Build MNN natively on an Armv9 Linux system for multimodal inference
Verify a CPU-only Omni model workflow with text, vision, and audio prompts
Create a reproducible multimodal application flow that combines image and audio inputs into an actionable restock ticket

Prerequisites

Before starting, you will need the following:

An Armv9 Linux device with at least 32 GB of available disk space, for example a Radxa Orion O6
Familiarity with the Linux command line, Git, and building C++ projects with CMake
Internet access to download source code, model assets, and sample data

Build a Multimodal Retail Restocking Assistant on Armv9 With MNN

Introduction

Run multimodal inference with MNN on Armv9

Build MNN and prepare an Omni model on Armv9

Validate text-only inference with an Omni model on Armv9

Run a vision retail shelf audit with MNN Omni

Convert spoken restock notes into structured tickets with MNN Omni

Build a single-shot multimodal restock ticket with MNN Omni

Next Steps

Build a Multimodal Retail Restocking Assistant on Armv9 With MNN

About this Learning Path

Who is this for?

What will you learn?

Prerequisites