Build Llama runner binary for Android

Cross-compile Llama runner to run on Android using the steps below.

1. Set Android NDK

Set the environment variable to point to the Android NDK.

    

        
        
            export ANDROID_NDK=~/Library/Android/sdk/ndk/25.0.8775105
        
    
Note

Make sure you can confirm $ANDROID_NDK/build/cmake/android.toolchain.cmake is available for CMake to cross-compile.

2. Build ExecuTorch and associated libraries for Android

Use cmake to cross-compile ExecuTorch:

    

        
        
            cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-out-android .

cmake --build cmake-out-android -j16 --target install --config Release
        
    

3. Build Llama runner for android

Use cmake to cross-compile Llama runner:

    

        
        
            cmake  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DCMAKE_BUILD_TYPE=Release \
    -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-out-android/examples/models/llama2 \
    examples/models/llama2

cmake --build cmake-out-android/examples/models/llama2 -j16 --config Release
        
    
Note

For Llama 3, add -DEXECUTORCH_USE_TIKTOKEN=ON option when building the Llama runner.

You should now have llama_main available for Android.

Run on Android via adb shell

1. Connect your android phone

Connect your phone to your computer using a USB cable.

You need to enable USB debugging on your Android device. You can follow Configure on-device developer options to enable USB debugging.

Once you have enabled USB debugging and connected via USB, run:

    

        
        
            adb devices
        
    

You should see your device listed to confirm it is connected.

2. Copy the model, tokenizer, and Llama runner binary to the phone

    

        
        
            adb shell mkdir -p /data/local/tmp/llama
adb push <model.pte> /data/local/tmp/llama/
adb push <tokenizer.bin> /data/local/tmp/llama/
adb push cmake-out-android/examples/models/llama2/llama_main /data/local/tmp/llama/
        
    
Note

For Llama 3, you can pass the original tokenizer.model (without converting to .bin file).

3. Run the model

Use the Llama runner to execute the model on the phone with the adb command:

    

        
        
            adb shell "cd /data/local/tmp/llama && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.bin> --prompt \"Once upon a time\" --seq_len 120"
        
    

You have successfully run a model on your Android smartphone.

Back
Next