To begin, skip the image prompt by pressing return and then input the text prompt as shown in the example below:
Now exit the server.
Next, download a sample image from the internet using the following wget
command:
wget https://cdn.pixabay.com/photo/2020/06/30/22/34/dog-5357794__340.jpg
After downloading the image, run the server again and provide the image file name when prompted, followed by the text prompt, as demonstrated in the example below:
As shown in the example above, the LLM Chatbot performs inference at a speed of 44 tokens/second, with the time to first token being approximately 1 second. This highlights the efficiency and responsiveness of the LLM Chatbot in processing queries and generating outputs.
You can continue interacting with the chatbot by asking follow-up prompts and observing the performance metrics displayed in the terminal.
This setup shows how to build applications using the Phi-3.5 model for multimodal generation from text and image inputs. It also highlights the performance benefits of running Phi models on Arm CPUs.