Run Phi-3 on Windows on Arm using ONNX Runtime: Run Phi3 Model

Run Phi-3 on Windows on Arm using ONNX Runtime

Log an issue

Fork and edit

Discuss on Discord

Run Phi-3 on Windows on Arm using ONNX Runtime

Run the Phi-3 model on your Windows on Arm machine

In this section, you’ll download the Phi-3 Mini model and run it on your WoA machine - either physical or virtual. You’ll use a simple model runner that also reports performance metrics.

The Phi-3 Mini (3.3B) model is available in two versions:

Short context (4K) - supports shorter prompts and uses less memory.
Long context (128K) - supports longer prompts and outputs but consumes more memory.

This Learning Path uses the short context version, which is quantized to 4-bits.

The Phi-3 Mini model used here is in ONNX format.

Set up

Phi-3 ONNX models are hosted on HuggingFace. Hugging Face uses Git for both version control and to download the ONNX model files, which are large.

Install Git LFS

You’ll first need to install the Git Large File Storage (LFS) extension:

    

        
        
winget install -e --id GitHub.GitLFS
git lfs install

If you don’t have winget, download the installer manually .

If Git LFS is already installed, you’ll see Git LFS initialized.

Install Hugging Face CLI

You then need to install the HuggingFace CLI:

    

        
        
pip install huggingface-hub[cli]

Download the Phi-3-Mini (4K) model

    

        
        
cd C:\Users\%USERNAME%
cd repos\lp
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .

This command downloads the model into a folder named cpu_and_mobile.

Build the Model Runner (ONNX Runtime GenAI C Example)

In the previous step, you built the ONNX Runtime Generate() API from source. Now, copy over the resulting headers and Dynamically Linked Libraries into the appropriate folders (lib and include).

Building from source is a better practice because the examples usually are updated to run with the latest changes:

    

        
        
copy onnxruntime\build\Windows\Release\Release\onnxruntime.* onnxruntime-genai\examples\c\lib
cd onnxruntime-genai
copy build\Windows\Release\Release\onnxruntime-genai.* examples\c\lib
copy src\ort_genai.h examples\c\include\
copy src\ort_genai_c.h examples\c\include\

You can now build the model runner executable in the ‘‘onnxruntime-genai’’ folder using the commands below:

    

        
        
cd examples/c
cmake -A arm64 -S . -B build -DPHI3=ON
cd build
cmake --build . --config Release

After a successful build, the binary phi3 will be created in the ‘‘onnxruntime-genai’’ folder:

    

        
        dir Release\phi3.exe

Run the model

Execute the model using the following command:

    

        
        
cd C:\Users\%USERNAME%
cd repos\lp
.\onnxruntime-genai\examples\c\build\Release\phi3.exe .\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\ cpu

This will allow the runner program to load the model. It will then prompt you to input the text prompt to the model as shown:

    

        
        -------------
Hello, Phi-3!
-------------
C++ API
Creating config...
Creating model...
Creating tokenizer...
Prompt: (Use quit() to exit) Or (To terminate current output generation, press Ctrl+C)

After you enter your input prompt, the text output by the model will be displayed. On completion, performance metrics similar to those shown below should be displayed:

    

        
        
Prompt length: 64, New tokens: 931, Time to first: 1.79s, Prompt tokens per second: 35.74 tps, New tokens per second: 6.34 tps

You have successfully run the Phi-3 model on your Windows on Arm device.

Back

Run Phi-3 on Windows on Arm using ONNX Runtime

Introduction

Set up your Environment

Build ONNX Runtime

Build ONNX Runtime Generate() API

Run Phi3 Model

Next Steps

Run Phi-3 on Windows on Arm using ONNX Runtime

Run the Phi-3 model on your Windows on Arm machine

Set up

Install Git LFS

Install Hugging Face CLI

Download the Phi-3-Mini (4K) model

Build the Model Runner (ONNX Runtime GenAI C Example)

Run the model