In this section, you will use the ExecuTorch Android demo application to run the customer support chatbot with a full chat interface on your phone.
Set up the build environment:
Open a terminal and navigate to the root directory of the executorch repository
If you have not already done so, set the following environment variables:
export ANDROID_NDK=$ANDROID_HOME/ndk/29.0.14206865/
export ANDROID_ABI=arm64-v8a
export ANDROID_SDK=$ANDROID_HOME
<path_to_android_ndk> is the root for the NDK, which is usually under ~/Library/Android/sdk/ndk/XX.Y.ZZZZZ on macOS, and contains NOTICE and README.md. Make sure <path_to_android_ndk>/build/cmake/android.toolchain.cmake is available for CMake to cross-compile.
Run the following command to set up the required JNI library:
sh scripts/build_android_library.sh
Make sure the exported model and tokenizer are on your Android phone.
Check if the files are already on the phone:
adb shell "ls -la /data/local/tmp/llama/"
If they are not present, copy them:
adb shell mkdir -p /data/local/tmp/llama
adb push llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte /data/local/tmp/llama/
adb push $HOME/.llama/checkpoints/Llama3.2-1B-Instruct/tokenizer.model /data/local/tmp/llama/
Use Android Studio’s Device Explorer to browse the phone’s filesystem and upload the files if they are not already present.
Clone the executorch-examples repository, which contains the LlamaDemo app:
git clone https://github.com/meta-pytorch/executorch-examples.git
Build and launch the app:
executorch-examples/llm/android/LlamaDemo
pushd llm/android/LlamaDemo
./gradlew :app:installDebug
popd
Once the app is running, you can set a system prompt in the app’s settings to configure it as a customer support assistant. Set the system prompt to something like:
You are a helpful customer support assistant. You answer questions about products, help with troubleshooting, and escalate issues politely when needed. Keep responses concise and friendly.
This gives the Llama model its role and behavioral guidelines for every conversation, without changing the underlying model weights.
You have successfully:
You now have a fully functional on-device customer support chatbot running on an Arm Android phone using ExecuTorch and KleidiAI. All inference runs locally with no cloud dependency and no user data leaving the device. You can customize the system prompt to match your specific product or domain requirements.