Introduction
Learn about LlamaIndex and Google Cloud C4A for RAG applications
Configure Google Cloud firewall rules for LlamaIndex
Create a Google Cloud C4A virtual machine for LlamaIndex
Install and configure LlamaIndex on a Google Cloud C4A virtual machine
Build and test a browser-based RAG application with LlamaIndex
Next Steps
Google Cloud C4A is a family of Arm-based virtual machines (VMs) built on Google’s custom Axion CPU, which is based on Arm Neoverse V2 cores. Designed for high-performance and energy-efficient computing, these VMs offer strong performance for modern cloud workloads.
The C4A series provides a cost-effective alternative to x86 virtual machines while using the scalability and performance benefits of the Arm architecture in Google Cloud.
LlamaIndex is an open-source framework designed to build context-aware AI applications using large language models (LLMs). It’s widely used for Retrieval-Augmented Generation (RAG), document indexing, vector search, semantic retrieval, and integrating custom data sources with LLMs.
LlamaIndex provides a unified framework with components such as:
Running LlamaIndex on Google Cloud C4A Arm-based infrastructure enables efficient execution of AI and RAG workloads by using multi-core Arm CPUs and optimized memory performance. This results in improved performance per watt, reduced infrastructure costs, and better scalability for browser-based AI applications and local inference pipelines.
In this Learning Path, you’ll use these components to build a browser-based RAG application that answers questions from custom documents.
You’ve now learned about Arm-based Google Cloud C4A VMs and their performance advantages for AI and RAG workloads. You were also introduced to core LlamaIndex components including document ingestion, indexing pipelines, query engines, vector stores, and LLM integrations.
Next, you’ll create a firewall rule in Google Cloud Console to enable remote access to the browser-based LlamaIndex RAG application that you’ll create in this Learning Path.