About this Learning Path

Who is this for?

This Learning Path is for developers and engineers who want to run multimodal image, audio, and text models on Armv9 Linux systems using MNN as a portable, CPU-first inference runtime. It is aimed at readers who are comfortable building software from source and want a reproducible on-device workflow without quantization or heterogeneous scheduling.

What will you learn?

Upon completion of this Learning Path, you will be able to:

  • Build MNN natively on an Armv9 Linux system for multimodal inference
  • Verify a CPU-only Omni model workflow with text, vision, and audio prompts
  • Create a reproducible multimodal application flow that combines image and audio inputs into an actionable restock ticket

Prerequisites

Before starting, you will need the following:

  • An Armv9 Linux device with at least 32 GB of available disk space, for example a Radxa Orion O6
  • Familiarity with the Linux command line, Git, and building C++ projects with CMake
  • Internet access to download source code, model assets, and sample data
Next