Run a Large Language Model (LLM) chatbot with PyTorch using KleidiAI on Arm servers

About this Learning Path

Who is this for?

This is an introductory topic for software developers interested in running LLMs using PyTorch on Arm-based servers.

What will you learn?

Upon completion of this learning path, you will be able to:

  • Download the Meta Llama 3.1 model from the Meta Hugging Face repository.
  • 4-bit quantize the model using optimized INT4 KleidiAI Kernels for PyTorch.
  • Run an LLM inference using PyTorch on an Arm-based CPU.
  • Expose an LLM inference as a browser application with Streamlit as the frontend and Torchchat framework in PyTorch as the LLM backend server.
  • Measure performance metrics of the LLM inference running on an Arm-based CPU.

Prerequisites

Before starting, you will need the following:

  • An Arm-based instance with at least 16 CPUs from a cloud service provider or an on-premise Arm server.
Next