Arm CPUs are widely used in ML and AI use cases. In this Learning Path, you will learn how to run the generative AI inference-based use case of an LLM chatbot on an Arm-based CPU. You will do this by deploying the
Qwen2-0.5B-Instruct model
on an Arm-based CPU using rtp-llm
.
This Learning Path has been tested on an Alibaba Cloud g8y.8xlarge instance and an AWS Graviton4 r8g.8xlarge instance.
rtp-llm is an open-source C/C++ project developed by Alibaba that enables efficient LLM inference on a variety of hardware.
RTP-LLM is a Large Language Model inference acceleration engine developed by Alibaba. Qwen is the name given to a series of Large Language Models developed by Alibaba Cloud that are capable of performing a variety of tasks.
Alibaba Cloud offer a wide range of models, each suitable for different tasks and use cases.
Besides generating text, they are also able to perform actions such as:
Qwen is open source, flexible, and encourages contribution from the software development community.