Cross-compile Llama runner to run on Android using the steps below.
Set the environment variable to point to the Android NDK.
export ANDROID_NDK=~/Library/Android/sdk/ndk/25.0.8775105
Make sure you can confirm $ANDROID_NDK/build/cmake/android.toolchain.cmake is available for CMake to cross-compile.
Use cmake
to cross-compile ExecuTorch:
cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-23 \
-DCMAKE_INSTALL_PREFIX=cmake-out-android \
-DCMAKE_BUILD_TYPE=Release \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
-DEXECUTORCH_ENABLE_LOGGING=1 \
-DPYTHON_EXECUTABLE=python \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
-Bcmake-out-android .
cmake --build cmake-out-android -j16 --target install --config Release
Use cmake
to cross-compile Llama runner:
cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-23 \
-DCMAKE_INSTALL_PREFIX=cmake-out-android \
-DCMAKE_BUILD_TYPE=Release \
-DPYTHON_EXECUTABLE=python \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
-Bcmake-out-android/examples/models/llama2 \
examples/models/llama2
cmake --build cmake-out-android/examples/models/llama2 -j16 --config Release
For Llama 3, add -DEXECUTORCH_USE_TIKTOKEN=ON
option when building the Llama runner.
You should now have llama_main
available for Android.
Connect your phone to your computer using a USB cable.
You need to enable USB debugging on your Android device. You can follow Configure on-device developer options to enable USB debugging.
Once you have enabled USB debugging and connected via USB, run:
adb devices
You should see your device listed to confirm it is connected.
adb shell mkdir -p /data/local/tmp/llama
adb push <model.pte> /data/local/tmp/llama/
adb push <tokenizer.bin> /data/local/tmp/llama/
adb push cmake-out-android/examples/models/llama2/llama_main /data/local/tmp/llama/
For Llama 3, you can pass the original tokenizer.model
(without converting to .bin
file).
Use the Llama runner to execute the model on the phone with the adb
command:
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.bin> --prompt \"Once upon a time\" --seq_len 120"
You have successfully run a model on your Android smartphone.