Arm CPUs are widely used in ML and AI use cases. In this Learning Path, you will learn how to run the generative AI inference-based use case of an LLM chatbot on an Arm-based CPU. You will do this by deploying the Qwen2-0.5B-Instruct model on an Arm-based CPU using rtp-llm.

Note

This Learning Path has been tested on an Alibaba Cloud g8y.8xlarge instance and an AWS Graviton4 r8g.8xlarge instance.

rtp-llm is an open-source C/C++ project developed by Alibaba that enables efficient LLM inference on a variety of hardware.

RTP-LLM is a Large Language Model inference acceleration engine developed by Alibaba. Qwen is the name given to a series of Large Language Models developed by Alibaba Cloud that are capable of performing a variety of tasks.

Alibaba Cloud offer a wide range of models, each suitable for different tasks and use cases.

Besides generating text, they are also able to perform actions such as:

  • Answering questions, through information retrieval, and analysis.
  • Processing images, and producing written descriptions of visual content.
  • Processing audio content.
  • Provide multilingual support, with over 27 additional languages, on top of the core languages of English and Chinese.

Qwen is open source, flexible, and encourages contribution from the software development community.

Back
Next