Vision LLM inference on Android with KleidiAI and MNN

Log an issue

Fork and edit

Discuss on Discord

About this Learning Path

Skill level:

Introductory

Reading time:

30 min

Last updated:

15 May 2025

Authors:	Shuheng Deng, Arm Yiyang Fan, Arm
Arm IP:	Cortex-A
Tags:	ML Android Android Studio KleidiAI

Authors:

Arm IP:

Tags:

Android

Android Studio

KleidiAI

This Learning Path is for developers who want to run Vision Transformers (ViT) efficiently on Android.

Upon completion of this learning path, you will be able to:

Download a Vision Large Language Model (LLM) from Hugging Face.
Convert the model to the Mobile Neural Network (MNN) framework.
Install an Android demo application using the model to run an inference.
Compare inference performance with and without KleidiAI Arm-optimized micro-kernels.

Before starting, you will need the following:

A development machine with Android Studio installed.
A smartphone running Android with support for i8mm and dotprod instructions.