Try a text prompt

To begin, input the text prompt as shown in the example below:

Image Alt Text:output

Observe performance metrics

As shown in the example above, the LLM Chatbot performs inference at a speed of ** 57 tokens/second**, with the time to first token being approximately 0.2 second. This highlights the efficiency and responsiveness of the LLM Chatbot in processing queries and generating outputs.

Further interaction and custom applications

You can continue interacting with the chatbot by asking follow-up prompts and observing the performance metrics displayed in the terminal.

This setup shows how to build applications using the Phi-4-mini model. It also highlights the performance benefits of running Phi models on Arm CPUs.

Back
Next