You can use the Android demo application included in ExecuTorch repository LlamaDemo to demonstrate local inference with ExecuTorch.
Open a terminal window and navigate to the root directory of the executorch
repository.
Set the following environment variables:
export ANDROID_NDK=~/Library/Android/sdk/ndk/25.0.8775105
export ANDROID_ABI=arm64-v8a
<path_to_android_ndk> is the root for the NDK, which is usually under ~/Library/Android/sdk/ndk/XX.Y.ZZZZZ for macOS, and contains NOTICE and README.md. Make sure you can confirm <path_to_android_ndk>/build/cmake/android.toolchain.cmake is available for CMake to cross-compile.
(Optional) If you need to use tiktoken as the tokenizer (for LLaMA 3), set EXECUTORCH_USE_TIKTOKEN=ON
and CMake uses it as the tokenizer. If you run other models like LLaMA 2, skip this step.
export EXECUTORCH_USE_TIKTOKEN=ON # Only for LLaMA3
Run the following commands to set up the required JNI library:
pushd extension/android
./gradlew build
popd
pushd examples/demo-apps/android/LlamaDemo
./gradlew :app:setup
popd
This is running the shell script setup.sh which configures and builds the required core ExecuTorch, Llama 2, and Android libraries.
Make sure the exported model and tokenizer are copied to the Android phone:
adb shell "ls -la /data/local/tmp/llama/"
adb shell mkdir -p /data/local/tmp/llama
adb push <model.pte> /data/local/tmp/llama/
adb push <tokenizer.bin> /data/local/tmp/llama/
If the files are not on the device, use the device explorer to copy them.
This is the recommended option.
Open Android Studio and select “Open an existing Android Studio project” and navigate to open examples/demo-apps/android/LlamaDemo
.
Run the app (^R). This builds and launches the app on the phone.
Without Android Studio UI, you can run gradle directly to build the app. You need to set up the Android SDK path and invoke gradle.
export ANDROID_HOME=<path_to_android_sdk_home>
pushd examples/demo-apps/android/LlamaDemo
./gradlew :app:installDebug
popd
You should now see a running app on your phone that looks like this: