Set up the base DGX Spark AI runtime

In this section, you’ll prepare the base runtime that you’ll use in the rest of the Learning Path.

You’ll install Docker, configure GPU-enabled containers, create a persistent workspace, and start the initial runtime service stack:

  • Ollama for local inference
  • Qdrant for vector memory
  • Open WebUI for browser-based model access

You’ll add the Hermes Agent in the next section. In this section, you’ll build the local infrastructure it depends on.

Verify the DGX Spark environment

Start by verifying that your DGX Spark system exposes the expected Arm CPU and NVIDIA GPU environment.

Check the CPU architecture:

    

        
        
uname -m

    

The expected output is:

    

        
        aarch64

        
    

Check the Linux distribution. DGX Spark runs Ubuntu 24.04:

    

        
        
lsb_release -a

    

Check that the NVIDIA GPU and CUDA driver stack are visible:

    

        
        
nvidia-smi

    

The output is similar to:

    

        
        nvidia-smi
Wed May 20 18:12:05 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GB10                    On  |   0000000F:01:00.0 Off |                  N/A |
| N/A   36C    P8              4W /  N/A  | Not Supported          |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            3565      G   /usr/lib/xorg/Xorg                      137MiB |
|    0   N/A  N/A            3776      G   /usr/bin/gnome-shell                    164MiB |
|    0   N/A  N/A            5115      G   .../8305/usr/lib/firefox/firefox        239MiB |
|    0   N/A  N/A           85940      G   ...m Performix/arm-performix-gui         54MiB |
+-----------------------------------------------------------------------------------------+

        
    

Confirm that the command shows the GPU name (NVIDIA GB10), driver version, and CUDA version. Make a note of the CUDA version, as you’ll use a matching container image when verifying GPU passthrough in the next step.

Install Docker

If you’ve not previously installed Docker, for detailed install steps, see the Docker Engine install guide .

To install Docker with one command, run:

    

        
        
curl -fsSL get.docker.com -o get-docker.sh && sh get-docker.sh

    

Allow your user to run Docker commands without sudo, then apply the new group membership in the current shell:

    

        
        
sudo usermod -aG docker $USER
newgrp docker

    

Verify Docker is working:

    

        
        
docker run hello-world

    

You’ll see a message confirming that Docker is installed and working.

Install NVIDIA Container Toolkit

The NVIDIA Container Toolkit allows Docker to expose the GPU to containers using the --gpus flag. Without it, containers can’t access the GPU regardless of the driver version installed on the host.

Add the NVIDIA Container Toolkit GPG key:

    

        
        
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

    

Add the NVIDIA Container Toolkit repository:

    

        
        
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

    

Install the toolkit:

    

        
        
sudo apt update
sudo apt install -y nvidia-container-toolkit

    

Register the NVIDIA runtime with Docker. This adds the nvidia runtime to Docker’s daemon configuration so containers can request GPU access with --gpus:

    

        
        
sudo nvidia-ctk runtime configure --runtime=docker

    

Restart Docker to apply the configuration change:

    

        
        
sudo systemctl restart docker

    

Verify GPU-enabled containers

Run a CUDA validation container:

    

        
        
docker run --rm --gpus all \
nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 \
nvidia-smi

    

If you have not pulled this image before, Docker downloads it before running nvidia-smi. The download can take a few minutes depending on your network connection.

The output is similar to:

    

        
        Unable to find image 'nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04' locally
13.0.1-devel-ubuntu24.04: Pulling from nvidia/cuda
03f66a4525ea: Pull complete 
c03b8ec8dd33: Pull complete 
cae1e96ffa7d: Pull complete 
2cb956a72162: Pull complete 
817eab9d3c52: Pull complete 
cc43ec4c1381: Pull complete 
30fc8198a31e: Pull complete 
c88eadd06616: Pull complete 
c7ba38867e8d: Pull complete 
fd2e70db7702: Pull complete 
85eb6b47da08: Pull complete 
Digest: sha256:7d2f6a8c2071d911524f95061a0db363e24d27aa51ec831fcccf9e76eb72bc92
Status: Downloaded newer image for nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04

==========
== CUDA ==
==========

CUDA Version 13.0.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Sun May 24 10:13:04 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.159.03             Driver Version: 580.159.03     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GB10                    On  |   0000000F:01:00.0 Off |                  N/A |
| N/A   44C    P0             10W /  N/A  | Not Supported          |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

        
    

If the command prints GPU information from inside the container, Docker GPU passthrough is working.

Docker can now run GPU-accelerated containers on DGX Spark.

Create the persistent workspace

Create the project directory:

    

        
        
mkdir -p ~/dgx-hermes-agent
cd ~/dgx-hermes-agent

    

Create the directory structure used by the runtime:

    

        
        
mkdir -p \
workspace/inbox \
workspace/memory \
workspace/logs \
workspace/processed \
workspace/config \
models \
compose \
qdrant

    

The workspace now looks like this:

    

        
        
dgx-hermes-agent/
|-- compose/
|-- models/
|-- qdrant/
|-- workspace/
|   |-- config/
|   |-- inbox/
|   |-- logs/
|   |-- memory/
|   `-- processed/

    

The workspace/ directory is shared across runtime services. Hermes will later monitor workspace/inbox/, write generated artifacts to workspace/memory/, and read runtime policies from workspace/config/.

Create the runtime service stack

Create and edit the file ~/dgx-hermes-agent/compose/docker-compose.yml.

Add the following content:

    

        
        
services:

  ollama:
    image: ollama/ollama:latest
    container_name: ollama

    ports:
      - "11434:11434"

    dns:
      - 8.8.8.8
      - 1.1.1.1

    volumes:
      - ../models:/root/.ollama
      - ../workspace:/workspace

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

    environment:
      - NVIDIA_VISIBLE_DEVICES=all

    restart: unless-stopped

  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant

    ports:
      - "6333:6333"
      - "6334:6334"

    volumes:
      - ../qdrant:/qdrant/storage

    restart: unless-stopped

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui

    ports:
      - "3000:8080"

    environment:
      - OLLAMA_BASE_URL=http://ollama:11434

    volumes:
      - open-webui:/app/backend/data

    depends_on:
      - ollama

    restart: unless-stopped

volumes:
  open-webui:

    

This Compose stack creates the first three runtime services. You’ll add Hermes as a fourth service later. The explicit DNS settings in the Ollama service help the container reach the model registry reliably. You’ll verify this in the networking step.

Understand the role of each runtime service

The initial stack separates model execution, memory storage, and user interaction.

ServiceRole
OllamaRuns local language and embedding models
QdrantStores persistent vector memory
Open WebUIProvides a local browser interface to Ollama

The models/ directory persists Ollama models on the host. The qdrant/ directory persists vector database storage. The workspace/ directory is mounted into Ollama now and will also be mounted into Hermes later.

Ollama doesn’t orchestrate workspace files by itself. The workspace mount verification step confirms shared storage access. Hermes will become the service that reads workspace files and decides when to call Ollama.

Start the runtime stack

If Ollama is already installed as a host service, stop it to avoid port conflicts:

    

        
        
sudo systemctl stop ollama
sudo systemctl disable ollama

    

Start the container stack:

    

        
        
cd ~/dgx-hermes-agent/compose
docker compose up -d

    
Note

The first docker compose up -d run can take several minutes, depending on your network speed, because Docker needs to pull the service images.

Verify that the containers are running:

    

        
        
docker ps

    

The output is similar to:

    

        
        NAME         IMAGE                                COMMAND               SERVICE      CREATED         STATUS                            PORTS
ollama       ollama/ollama:latest                 "/bin/ollama serve"   ollama       5 seconds ago   Up 4 seconds                      0.0.0.0:11434->11434/tcp, [::]:11434->11434/tcp
open-webui   ghcr.io/open-webui/open-webui:main   "bash start.sh"       open-webui   4 seconds ago   Up 4 seconds (health: starting)   0.0.0.0:3000->8080/tcp, [::]:3000->8080/tcp
qdrant       qdrant/qdrant:latest                 "./entrypoint.sh"     qdrant       5 seconds ago   Up 4 seconds                      0.0.0.0:6333-6334->6333-6334/tcp, [::]:6333-6334->6333-6334/tcp

        
    

Validate the base DGX Spark AI runtime

After starting the runtime, verify that it works as expected.

Verify container networking

Open a shell in the Ollama container:

    

        
        
docker exec -it ollama bash

    

You might see a warning such as groups: cannot find name for group ID 992. It’s a harmless warning that appears when the container’s /etc/group file has no entry for your host user’s GID. The shell opens normally and all commands work as expected.

Verify DNS resolution:

    

        
        
getent hosts registry.ollama.ai

    

The output is similar to:

    

        
        root@367b013fd34c:/# getent hosts registry.ollama.ai
2606:4700:3036::6815:4be3 registry.ollama.ai
2606:4700:3034::ac43:b6e5 registry.ollama.ai

        
    

Exit the container shell:

    

        
        
exit

    

The DNS settings in the Compose file help the container reach the Ollama model registry reliably.

Pull local models

Open a shell in the Ollama container:

    

        
        
docker exec -it ollama bash

    

Pull the language model:

    

        
        
ollama pull qwen2.5:7b

    

Pull the embedding model:

    

        
        
ollama pull nomic-embed-text

    

Exit the container:

    

        
        
exit

    

These model names are used throughout the examples, so make note of them. The architecture supports other suitable models.

ModelPurpose
qwen2.5:7bLocal chat, summarization, reasoning
nomic-embed-textEmbedding generation for semantic memory

Because the models/ directory is mounted into the container as a volume, downloaded models are stored on the host at ~/dgx-hermes-agent/models/. The models persist even if the container is removed or recreated.

Verify local inference

Open a shell in the Ollama container:

    

        
        
docker exec -it ollama bash

    

Run the local model:

    

        
        
ollama run qwen2.5:7b

    

Enter a short prompt, such as:

    

        
        
Summarize the role of CPU orchestration for an AI agent in one sentence.

    

After the model responds, type /bye to exit the interactive model session, then type exit to leave the container shell.

You can also monitor GPU activity from another terminal when the model is running:

    

        
        
nvtop

    

During inference, you’ll see GPU utilization rise on the Blackwell GPU as the model processes the prompt and generates tokens. The change in GPU utilization shows that the model is running on the GPU rather than falling back to the CPU.

This step shows that local inference is available before Hermes begins calling Ollama programmatically.

Verify Open WebUI

Open a browser and navigate to:

    

        
        
http://localhost:3000

    

On first launch, Open WebUI presents a setup screen asking for a name and email address to create a local admin account. The email can be a placeholder and no data leaves your system. Enter any values and continue to the main interface.

To verify that Ollama is reachable from the host, navigate to:

    

        
        
http://localhost:11434

    

If Ollama is running, the browser displays the message Ollama is running. The message confirms that the Ollama container is accessible on the expected port. Open WebUI connects to Ollama using the internal Docker network address http://ollama:11434, but from the host you use localhost:11434.

Use Open WebUI to confirm that the local model is listed and available for chat. Open WebUI is not used in the agent workflow in the sections that follow. Hermes calls Ollama directly through its API, so Open WebUI serves only as a convenient way to validate the inference stack before the agent takes over.

Verify Qdrant

Open the Qdrant dashboard:

    

        
        
http://localhost:6333/dashboard

    

Image Alt Text:Qdrant dashboard running locally before the workspace_memory collection is createdQdrant Dashboard

Qdrant is running, but it doesn’t contain the workspace_memory collection yet. Hermes creates that collection later when you add persistent memory.

Verify the shared workspace mount

Open another terminal on your DGX Spark system and create a test file on the host. Don’t run this command inside a container.

    

        
        
echo "Arm CPUs orchestrate persistent AI workflows." \
> ~/dgx-hermes-agent/workspace/inbox/test.txt

    

Verify that the shared mount is visible by opening a shell in the Ollama container:

    

        
        
docker exec -it ollama bash

    

Inside the container, run:

    

        
        
ls -l /workspace
cat /workspace/inbox/test.txt

    

The output is similar to:

    

        
        drwxrwxr-x 2 1001 1001 4096 May 20 18:16 config
drwxrwxr-x 2 1001 1001 4096 May 20 18:37 inbox
drwxrwxr-x 2 1001 1001 4096 May 20 18:16 logs
drwxrwxr-x 2 1001 1001 4096 May 20 18:16 memory
drwxrwxr-x 2 1001 1001 4096 May 20 18:16 processed

        
    

The file contains the following:

    

        
        Arm CPUs orchestrate persistent AI workflows.

        
    

Exit the container:

    

        
        
exit

    

What you’ve accomplished and what’s next

You’ve built the runtime foundation for the persistent local AI system. The DGX Spark environment now has Docker, Docker Compose, NVIDIA Container Toolkit, GPU-enabled containers, persistent workspace storage, and the initial Ollama, Qdrant, and Open WebUI services.

You’ve also verified shared workspace access, local inference, and the fixed model setup used by the later sections.

Next, you’ll add Hermes Agent as the persistent orchestration runtime.

Back
Next