To begin, input the text prompt as shown in the example below:
As shown in the example above, the LLM Chatbot performs inference at a speed of ** 57 tokens/second**, with the time to first token being approximately 0.2 second. This highlights the efficiency and responsiveness of the LLM Chatbot in processing queries and generating outputs.
You can continue interacting with the chatbot by asking follow-up prompts and observing the performance metrics displayed in the terminal.
This setup shows how to build applications using the Phi-4-mini model. It also highlights the performance benefits of running Phi models on Arm CPUs.