LLM inference on Android with KleidiAI, MediaPipe, and XNNPACK: Review

Log an issue

Fork and edit

Discuss on Discord

What you've learned

You should now know how to:

Install the prerequisites for cross-compiling new inference engines for Android.
Run LLM inference on an Android device with the Gemma 2B model using the Google AI Edge's MediaPipe framework.
Benchmark LLM inference speed with and without the KleidiAI-enhanced Arm i8mm processor feature.

The KleidiAI performance improvements are noticeable in which type of benchmarks?

int8 benchmarks

float32 benchmarks

mixed int4/int8 benchmarks

What is MediaPipe?

A cross-platform framework from Google AI Edge used for building multimodal applied ML pipelines.

A library for efficient matrix multiplication on Arm processors.

A tool for quantizing machine learning models.

Does Android NDK r21 include support for i8mm instructions?

Yes

Back