Environment setup and prepare model

Vision LLM inference on Android with KleidiAI and MNN

Log an issue

Fork and edit

Discuss on Discord

Vision LLM inference on Android with KleidiAI and MNN

Install Required Software

In this section, you’ll set up your development environment by installing dependencies and preparing the Qwen vision model.

Install the Android NDK (Native Development Kit) and git-lfs. This Learning Path was tested with NDK version 28.0.12916984 and CMake version 4.0.0-rc1.

For Ubuntu or Debian systems, install CMake and git-lfs with the following commands:

    

        
        
sudo apt update
sudo apt install cmake git-lfs -y

You can use Android Studio to obtain the NDK.

Click Tools > SDK Manager and navigate to the SDK Tools tab.

Select the NDK (Side by side) and CMake checkboxes, as shown below:

Image Alt Text:Install NDK

See Install NDK and CMake for other installation methods.

Ensure that Python and pip are installed by verifying the version with these commands:

    

        
        
python --version
pip --version

You see the versions printed:

    

        
        Python 3.12.3
pip 24.0 from /usr/lib/python3/dist-packages/pip (python 3.12)

Note

If Python 3.x is not the default version, try running python3 --version and pip3 --version.

Set up Phone Connection

You need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included with Android Studio, provides Android Debug Bridge (ADB) for transferring files.

Connect your phone to your computer using a USB cable, and enable USB debugging on your phone. To do this, tap the Build Number in your Settings app 7 times, then enable USB debugging in Developer Options.

Verify the connection by running:

    

        
        
adb devices

If your device is connected you see it listed with your device id:

    

        
        List of devices attached
<DEVICE ID>     device

Download the quantized Model

The pre-quantized model is available in Hugging Face, you can download with the following command:

    

        
        
git lfs install
git clone https://huggingface.co/taobao-mnn/Qwen2.5-VL-3B-Instruct-MNN
git checkout 9057334b3f85a7f106826c2fa8e57c1aee727b53

(Optional) Download and Convert the Model

If you need to quantize the model with customized parameter, the following commands download the model from Hugging Face, and clone a tool for exporting the LLM model to the MNN framework.

    

        
        
cd $HOME
pip install -U huggingface_hub
huggingface-cli download Qwen/Qwen2-VL-2B-Instruct --local-dir ./Qwen2-VL-2B-Instruct/
git clone https://github.com/wangzhaode/llm-export
cd llm-export && pip install .

Use the llm-export repository to quantize the model with these options:

    

        
        
llmexport --path ../Qwen2-VL-2B-Instruct/ --export mnn --quant_bit 4 \
    --quant_block 0 --dst_path Qwen2-VL-2B-Instruct-convert-4bit-per_channel --sym

The table below gives you an explanation of the different arguments:

Parameter	Description	Explanation
`--quant_bit`	MNN quant bit, 4 or 8, default is 4.	`4` represents q4 quantization.
`--quant_block`	MNN quant block, default is 0.	`0` represents per-channel quantization; `128` represents 128 per-block quantization.
`--sym`	Symmetric quantization (without zeropoint); default is False.	The quantization parameter that enables symmetrical quantization.

To learn more about the parameters, see the transformers README.md .

Verify that the model was built correctly by checking that the Qwen2-VL-2B-Instruct-convert-4bit-per_channel directory is at least 1 GB in size.

Push the model to Android device

Push the model onto the device:

    

        
        
adb shell mkdir /data/local/tmp/models/
adb push Qwen2.5-VL-3B-Instruct-MNN /data/local/tmp/models

With the model set up, you’re ready to build and run an example application.

Back

Vision LLM inference on Android with KleidiAI and MNN

Introduction

Background

Environment setup and prepare model

Build the MNN Command-line ViT Demo

Next Steps

Vision LLM inference on Android with KleidiAI and MNN

Install Required Software

Set up Phone Connection

Download the quantized Model

(Optional) Download and Convert the Model

Push the model to Android device