About this Learning Path

Who is this for?

This is an advanced topic for developers who want to build a Retrieval-Augmented Generation (RAG) pipeline on the NVIDIA DGX Spark platform. You'll learn how Arm-based Grace CPUs handle document retrieval and orchestration, while Blackwell GPUs speed up large language model inference using the open-source llama.cpp REST server. This is a great fit if you're interested in combining Arm CPU management with GPU-accelerated AI workloads.

What will you learn?

Upon completion of this Learning Path, you will be able to:

  • Describe how a RAG system combines document retrieval and language model generation
  • Deploy a hybrid CPU-GPU RAG pipeline on the GB10 platform using open-source tools
  • Use the llama.cpp REST Server for GPU-accelerated inference with CPU-managed retrieval
  • Build a reproducible RAG application that demonstrates efficient hybrid computing

Prerequisites

Before starting, you will need the following:

  • An NVIDIA DGX Spark system with at least 15 GB of available disk space
Next