Overview

This Learning Path shows you how to use a 32-core Azure Dpls_v6 instance powered by an Arm Neoverse N2 CPU to build a simple chatbot that you can use to serve a small number of concurrent users.

This architecture is suitable for deploying the latest Generative AI technologies with RAG capabilities using their existing CPU compute capacity and deployment pipelines.

The demo uses the ONNX runtime, which Arm has integrated with KleidiAI. Further optimizations are achieved by using the smaller Phi-4-mini model, which has been optimized at INT4 quantization to minimize memory usage.

Chat with the LLM below to see the performance for yourself, and then follow the Learning Path to build your own Generative AI service on Arm Neoverse.

Running the Demo

  1. Type and send a message to the chatbot.
  2. Receive the chatbot’s reply.
  3. View performance statistics demonstrating how well Azure Cobalt 100 instances run LLMs.
summary picture
Blown Up Diagram

Phi-4-mini Chatbot Demo

Pinging LLM server
Reset chat
Your use of this demo is subject to the Terms of Use.

Stats

Type a message to the chatbot to view metrics.