In this section, you’ll benchmark model performance with and without KleidiAI kernels. To run optimized inference, you’ll first need to compile the required library files. You’ll also need an example image to run command-line prompts.
You can use the provided image of the tiger below that this Learning Path uses, or choose your own.
Whichever you select, rename the image to example.png
to use the commands in the following sections.
Use ADB to load the image onto your phone:
adb push example.png /data/local/tmp/
Navigate to the MNN project that you cloned in the previous section.
Create a build directory and run the build script.
The first time that you do this, build the binaries with the -DMNN_KLEIDIAI
flag set to FALSE
.
cd $HOME/MNN/project/android
mkdir build_64 && cd build_64
../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=FALSE \
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \
-DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \
-DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true"
If your NDK toolchain isn’t set up correctly, you might run into issues with the above script. Make a note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running build_64.sh
:
export ANDROID_NDK_HOME=<path-to>/ndk/28.0.12916984
export CMAKE_TOOLCHAIN_FILE=$ANDROID_NDK_HOME/build/cmake/android.toolchain.cmake
export ANDROID_NDK=$ANDROID_NDK_HOME
Push the required files to your Android device, then enter a shell on the device using ADB:
adb push *so llm_demo tools/cv/*so /data/local/tmp/
adb shell
Run the following commands in the ADB shell. Navigate to the directory you pushed the files to, add executable permissions to the llm_demo
file and export an environment variable for it to run properly. After this, use the example image you transferred earlier to create a file containing the text content for the prompt.
cd /data/local/tmp/
chmod +x llm_demo
export LD_LIBRARY_PATH=$PWD
echo "<img>./example.png</img>Describe the content of the image." > prompt
Finally, run an inference on the model with the following command:
./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt
If the launch is successful, you should see the following output, with the performance benchmark at the end:
config path is models/Qwen-VL-2B-convert-4bit-per_channel/config.json
tokenizer_type = 3
prompt file is prompt
The image features a tiger standing in a grassy field, with its front paws raised and its eyes fixed on something or someone behind it. The tiger's stripes are clearly visible against the golden-brown background of the grass. The tiger appears to be alert and ready for action, possibly indicating a moment of tension or anticipation in the scene.
#################################
prompt tokens num = 243
decode tokens num = 70
vision time = 5.76 s
audio time = 0.00 s
prefill time = 1.26 s
decode time = 2.02 s
prefill speed = 192.28 tok/s
decode speed = 34.73 tok/s
##################################
The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag -DMNN_KLEIDIAI
to TRUE
.
From the build_64
directory, run:
../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=TRUE \
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \
-DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \
-DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true"
First, remove existing binaries from your Android device, then push the updated files:
adb shell "cd /data/local/tmp; rm -rf *so llm_demo tools/cv/*so"
adb push *so llm_demo tools/cv/*so /data/local/tmp/
adb shell
With the new ADB shell, run the following commands:
cd /data/local/tmp/
chmod +x llm_demo
export LD_LIBRARY_PATH=$PWD
./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt
After running with KleidiAI enabled, you should see improved benchmarks. Example results:
#################################
prompt tokens num = 243
decode tokens num = 70
vision time = 2.91 s
audio time = 0.00 s
prefill time = 0.91 s
decode time = 1.56 s
prefill speed = 266.13 tok/s
decode speed = 44.96 tok/s
##################################
This time, you should see an improvement in the benchmark. Below is an example table showing the uplift on three relevant metrics after enabling the KleidiAI kernels:
Benchmark | Without KleidiAI | With KleidiAI |
---|---|---|
Vision Process Time | 5.76 s | 2.91 s |
Prefill Speed | 192.28 tok/s | 266.13 tok/s |
Decode Speed | 34.73 tok/s | 44.96 tok/s |
Prefill speed describes how fast the model processes the input prompt.
Decode Speed indicates how quickly the model generates new tokens after the input is processed.
These benchmarks clearly demonstrate the performance advantages of using Arm-optimized KleidiAI kernels for vision transformer (ViT) workloads.