Install and configure LlamaIndex on a Google Cloud C4A virtual machine

Build RAG applications with LlamaIndex on a Google Cloud C4A virtual machine

Log an issue

Fork and edit

Discuss on Discord

Build RAG applications with LlamaIndex on a Google Cloud C4A virtual machine

Prepare the environment for a LlamaIndex RAG application

In this section, you’ll prepare a Google Cloud Axion Arm64 VM for running a browser-based RAG application using LlamaIndex.

You’ll install required system packages, including Python 3.11, as well as Ollama and LlamaIndex.

Update the virtual machine

Update all system packages:

    

        
        
sudo zypper refresh
sudo zypper update -y

This ensures your system is up to date before installing anything.

Install required packages

Install Python 3.11 and the build tools needed to compile Python packages with native extensions:

    

        
        
sudo zypper install -y \
git \
curl \
wget \
tar \
gzip \
gcc \
gcc-c++ \
make \
cmake \
sqlite3 \
python311 \
python311-pip \
python311-devel \
python311-setuptools \
python311-wheel

Verify Python is installed correctly:

    

        
        
python3.11 --version

The output is similar to:

    

        
        Python 3.11.10
pip 22.3.1 from /usr/lib/python3.11/site-packages/pip (python 3.11)

Create project directory

Create a project directory and a Python virtual environment. The virtual environment isolates the Python packages for this project from your system packages:

    

        
        
mkdir -p ~/llamaindex-rag/data
cd ~/llamaindex-rag

Create and activate the Python virtual environment:

    

        
        
python3.11 -m venv rag-env
source rag-env/bin/activate

Upgrade pip to the latest version:

    

        
        
pip install --upgrade pip setuptools wheel

Install Ollama

Use the official Linux installer to install Ollama:

    

        
        
curl -fsSL https://ollama.com/install.sh | sh

Verify the Ollama version:

    

        
        
ollama -v

The output is similar to:

    

        
        ollama version is 0.24.0

Check Ollama is running

When installed using the official script, Ollama registers itself as a systemd service and starts automatically. Verify it is running:

    

        
        
sudo systemctl status ollama

If the service is not running, start it:

    

        
        
sudo systemctl start ollama

Pull an LLM model

With Ollama running, pull the llama3.2:1b model. This is a lightweight 1-billion-parameter model suitable for local inference on a 16 GB VM:

    

        
        
ollama pull llama3.2:1b

Test that the model responds correctly:

    

        
        
ollama run llama3.2:1b "Explain RAG in one sentence."

The output is similar to:

    

        
        Retrieval-Augmented Generation (RAG) is a technique that combines a retrieval step, which fetches relevant documents from a knowledge base, with a generation step, where a large language model uses those documents to produce a grounded, context-aware response.

Install LlamaIndex packages

Install the LlamaIndex core library along with the integrations needed for Ollama, Hugging Face embeddings, and ChromaDB. You’ll also install FastAPI and Uvicorn here because the browser-based application you’ll build in the next section uses them as the web server:

    

        
        
pip install llama-index
pip install llama-index-llms-ollama
pip install llama-index-embeddings-huggingface
pip install llama-index-vector-stores-chroma
pip install chromadb
pip install sentence-transformers
pip install fastapi
pip install uvicorn

What you’ve accomplished and what’s next

You’ve now installed and configured LlamaIndex on a Google Cloud C4A Arm64 VM running SUSE Linux with Python 3.11. You configured Ollama for local LLM inference and prepared the environment for building browser-based RAG applications using LlamaIndex and ChromaDB.

Next, you’ll build the RAG engine, create the browser UI, and query custom documents using a local large language model.

Back

Build RAG applications with LlamaIndex on a Google Cloud C4A virtual machine

Introduction

Learn about LlamaIndex and Google Cloud C4A for RAG applications

Configure Google Cloud firewall rules for LlamaIndex

Create a Google Cloud C4A virtual machine for LlamaIndex

Install and configure LlamaIndex on a Google Cloud C4A virtual machine

Build and test a browser-based RAG application with LlamaIndex

Next Steps

Build RAG applications with LlamaIndex on a Google Cloud C4A virtual machine

Prepare the environment for a LlamaIndex RAG application

Update the virtual machine

Install required packages

Create project directory

Install Ollama

Check Ollama is running

Pull an LLM model

Install LlamaIndex packages

What you’ve accomplished and what’s next