Run an LLM chatbot with rtp-llm on Arm-based servers: Background

Run an LLM chatbot with rtp-llm on Arm-based servers

Log an issue

Fork and edit

Discuss on Discord

Run an LLM chatbot with rtp-llm on Arm-based servers

Arm CPUs are widely used in ML and AI use cases. In this Learning Path, you will learn how to run the generative AI inference-based use case of an LLM chatbot on an Arm-based CPU. You will do this by deploying the Qwen2-0.5B-Instruct model on an Arm-based CPU using rtp-llm.

Note

This Learning Path has been tested on an Alibaba Cloud g8y.8xlarge instance and an AWS Graviton4 r8g.8xlarge instance.

rtp-llm is an open-source C/C++ project developed by Alibaba that enables efficient LLM inference on a variety of hardware.

RTP-LLM is a Large Language Model inference acceleration engine developed by Alibaba. Qwen is the name given to a series of Large Language Models developed by Alibaba Cloud that are capable of performing a variety of tasks.

Alibaba Cloud offer a wide range of models, each suitable for different tasks and use cases.

Besides generating text, they are also able to perform actions such as:

Answering questions, through information retrieval, and analysis.
Processing images, and producing written descriptions of visual content.
Processing audio content.
Provide multilingual support, with over 27 additional languages, on top of the core languages of English and Chinese.

Qwen is open source, flexible, and encourages contribution from the software development community.

Back

Run an LLM chatbot with rtp-llm on Arm-based servers

Introduction

Background

Run an LLM chatbot with rtp-llm on an Arm server

Access the chatbot with rtp-llm using the OpenAI-compatible API

Next Steps

Run an LLM chatbot with rtp-llm on Arm-based servers