In this Learning Path, you’ll learn how to build and deploy a large language model (LLM) on a Windows on Arm (WoA) machine using ONNX Runtime for inference.
Specifically, you’ll learn how to:
The short-context version accepts shorter (4K) prompts and generates shorter outputs than the long-context (128K) version. It also consumes less memory.
Your first task is to prepare a development environment with the required software.
Start by installing the required tools:
These instructions were tested on a 64-bit WoA machine with at least 16GB of RAM.
Now, to install and configure Visual Studio, follow these steps:
Download the latest Visual Studio IDE .
Select the Community edition. This downloads an installer called VisualStudioSetup.exe.
Run VisualStudioSetup.exe from your Downloads folder.
Follow the prompts and accept the License Terms and Privacy Statement.
When prompted to select workloads, select Desktop Development with C++. This installs the Microsoft Visual Studio Compiler (MSVC).
Refer to Visual Studio for Windows on Arm for more details.
Download and install Python for Windows on Arm .
You’ll need Python version 3.10 or higher. This Learning Path was tested with version 3.11.9.
CMake is an open-source tool that automates the build process and generates platform-specific build configurations.
Download and install CMake for Windows on Arm .
The instructions were tested with version 3.30.5.
You’re now ready to build ONNX Runtime and run inference using the Phi-3 model.