Deploy a RAG-based Chatbot with llama-cpp-python using KleidiAI on Google Axion processors

Log an issue

Fork and edit

Discuss on Discord

Deploy a RAG-based Chatbot with llama-cpp-python using KleidiAI on Google Axion processors

About this Learning Path

Skill level:	Advanced
Reading time:	45 min
Last updated:	02 Jun 2025

Skill level:

Advanced

Reading time:

45 min

Last updated:

02 Jun 2025

Author:	Nobel Chowdary Mandepudi, Arm
Arm IP:	Neoverse
Tags:	ML Google Cloud Linux Python Streamlit Google Axion Demo Hugging Face

Author:

Nobel Chowdary Mandepudi, Arm

Arm IP:

Neoverse

Tags:

Google Cloud

Linux

Python

Streamlit

Google Axion

Demo

Hugging Face

Who is this for?

This Learning Path is for software developers, ML engineers, and those looking to deploy production-ready LLM chatbots with Retrieval Augmented Generation (RAG) capabilities, knowledge base integration, and performance optimization for Arm Architecture.

What will you learn?

Upon completion of this learning path, you will be able to:

Set up llama-cpp-python optimized for Arm servers.
Implement RAG architecture using the Facebook AI Similarity Search (FAISS) vector database.
Optimize model performance through 4-bit quantization.
Build a web interface for document upload and chat.
Monitor and analyze inference performance metrics.

Prerequisites

Before starting, you will need the following:

A Google Cloud Axion (or other Arm) compute instance with at least 16 cores, 8GB of RAM, and 32GB disk space.
Basic understanding of Python and ML concepts.
Familiarity with REST APIs and web services.
Basic knowledge of vector databases.
Understanding of LLM fundamentals.

Deploy a RAG-based Chatbot with llama-cpp-python using KleidiAI on Google Axion processors

Introduction

Demo

Set up a RAG based LLM Chatbot

Deploy a RAG-based LLM backend server

Deploy RAG-based LLM frontend server

The RAG Chatbot and its Performance

Next Steps

Deploy a RAG-based Chatbot with llama-cpp-python using KleidiAI on Google Axion processors

About this Learning Path

Who is this for?

What will you learn?

Prerequisites