Who is this for?
This is an advanced topic for Android developers who want to efficiently run LLMs on-device.
What will you learn?
Upon completion of this learning path, you will be able to:
- Install the prerequisites for cross-compiling new inference engines for Android.
- Run LLM inference on an Android device with the Gemma 2B model using the Google AI Edge's MediaPipe framework.
- Benchmark LLM inference speed with and without the KleidiAI-enhanced Arm i8mm processor feature.
Prerequisites
Before starting, you will need the following:
- An x86_64 Linux machine running Ubuntu with approximately 500 MB of free space, or a docker daemon that can build and run a provided x86_64 Dockerfile.
- An Android phone with support for i8mm (tested on Google Pixel 8 Pro).