In this Learning Path, you’ll learn how to build and deploy a large language model (LLM) on a Windows on Arm (WoA) machine using ONNX Runtime for inference.
Specifically, you’ll learn how to:
The short-context version accepts shorter (4K) prompts and generates shorter outputs than the long-context (128K) version. It also consumes less memory.
Your first task is to prepare a development environment with the required software.
Start by installing the required tools:
These instructions were tested on a 64-bit WoA machine with at least 16GB of RAM.
Now, to install and configure Visual Studio, follow these steps:
Download the latest Visual Studio IDE .
Select the Community edition. This downloads an installer called VisualStudioSetup.exe
.
Run VisualStudioSetup.exe
from your Downloads folder.
Follow the prompts and accept the License Terms and Privacy Statement.
When prompted to select workloads, select Desktop Development with C++. This installs the Microsoft Visual Studio Compiler (MSVC).
Refer to Visual Studio for Windows on Arm for more details.
Download and install Python for Windows on Arm .
You’ll need Python version 3.10 or higher. This Learning Path was tested with version 3.11.9.
CMake is an open-source tool that automates the build process and generates platform-specific build configurations.
Download and install CMake for Windows on Arm .
The instructions were tested with version 3.30.5.
You’re now ready to build ONNX Runtime and run inference using the Phi-3 model.