GPU configuration

Enable GPU acceleration for AI services that support NVIDIA GPUs. This significantly improves inference speed for local models.

Prerequisites

NVIDIA GPU

Verify you have a compatible NVIDIA GPU:

# Check GPU
lspci | grep -i nvidia

# Verify NVIDIA driver
nvidia-smi

Expected output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03   Driver Version: 535.129.03   CUDA Version: 12.2   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA RTX 4090     Off  | 00000000:01:00.0 Off |                  N/A |
| 30%   45C    P8    25W / 450W |      1MiB / 24564MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

NVIDIA Container Toolkit

Install the NVIDIA Container Toolkit to enable GPU access in Docker containers.

Ubuntu/Debian
RHEL/CentOS/Fedora

# Add NVIDIA package repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Restart Docker
sudo systemctl restart docker

# Add NVIDIA package repository
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

# Install toolkit
sudo dnf install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Restart Docker
sudo systemctl restart docker

Verify installation

Test GPU access from Docker:

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

If successful, you’ll see the nvidia-smi output inside the container.

Enabling GPU passthrough

Generate a stack with GPU support:

npx create-better-openclaw
# Select "Enable GPU passthrough" when prompted

The --gpu flag automatically adds GPU device reservations to services that support it.

GPU-enabled services

Required GPU

These services require a GPU to function:

Service	Description	Memory Required
Stable Diffusion	AI image generation	~4 GB VRAM

Optional GPU

These services work without GPU but benefit from acceleration:

Service	Description	GPU Benefit
Ollama	Local LLM inference	5-10x faster inference
Whisper	Speech-to-text	3-5x faster transcription
ComfyUI	Node-based AI workflows	Faster image generation

Docker Compose configuration

When you enable --gpu, better-openclaw adds GPU device reservations to docker-compose.yml:

docker-compose.yml

services:
  ollama:
    image: ollama/ollama:0.17.0
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    # ... rest of service config

  stable-diffusion:
    image: ghcr.io/stable-diffusion-webui/stable-diffusion-webui:latest-cuda
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    # ... rest of service config

Limiting GPU access

To restrict GPU access to specific devices:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          device_ids: ['0']  # Only GPU 0
          capabilities: [gpu]

Or limit by count:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1  # Only 1 GPU
          capabilities: [gpu]

Verifying GPU usage

Check container GPU access

# Enter Ollama container
docker compose exec ollama bash

# Check GPU visibility
nvidia-smi

Monitor GPU usage

Watch real-time GPU metrics:

# Continuous monitoring
watch -n 1 nvidia-smi

# Or with better formatting
nvidia-smi dmon -s pucvmet

Ollama GPU usage

When running models, Ollama displays GPU information:

docker compose exec ollama ollama run llama3.2

# Output shows GPU memory allocation:
# Loading model... 100%
# Model loaded on GPU 0 (8.2 GB / 24 GB VRAM)

Performance optimization

VRAM allocation

Allocate appropriate VRAM based on model size:

Model Size	Minimum VRAM	Recommended VRAM
7B params	6 GB	8 GB
13B params	12 GB	16 GB
34B params	24 GB	32 GB
70B params	48 GB	64 GB

Multi-GPU configuration

For systems with multiple GPUs, better-openclaw enables all GPUs by default (count: all). To distribute services across GPUs:

Manually edit docker-compose.yml to assign specific device IDs
Use Docker Compose profiles to start services independently

# GPU 0 for Ollama
ollama:
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]

# GPU 1 for Stable Diffusion
stable-diffusion:
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            device_ids: ['1']
            capabilities: [gpu]

Compute mode

Set GPU compute mode for exclusive access (recommended for production):

# Set exclusive process mode (one context per GPU)
sudo nvidia-smi -c EXCLUSIVE_PROCESS

# Or default shared mode (multiple contexts)
sudo nvidia-smi -c DEFAULT

Troubleshooting

GPU not detected

Check NVIDIA driver:

nvidia-smi
# If this fails, reinstall NVIDIA drivers

Verify Container Toolkit:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Check Docker daemon config:

cat /etc/docker/daemon.json

Should contain:

{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

Out of memory errors

RuntimeError: CUDA out of memory

Solutions:

Use smaller models:

# Instead of llama3:70b, use llama3.2:7b
ollama pull llama3.2:7b

Enable model quantization:

# Use 4-bit quantized models
ollama pull llama3.2:7b-q4_0

Reduce batch size or context length in service config

Driver/CUDA version mismatch

Error: CUDA driver version is insufficient for CUDA runtime version

Fix:

# Update NVIDIA driver
sudo ubuntu-drivers autoinstall
sudo reboot

# Or install specific version
sudo apt install nvidia-driver-535

Container can’t access GPU

failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory

Fix:

# Reinstall NVIDIA Container Toolkit
sudo apt-get purge nvidia-container-toolkit
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Restart stack
docker compose down
docker compose up -d

Monitoring with Grafana

Add GPU metrics to your monitoring stack:

npx create-better-openclaw \
  --services ollama,prometheus,grafana \
  --gpu \
  --monitoring \
  --yes

Install NVIDIA DCGM Exporter for GPU metrics:

docker-compose.yml

services:
  dcgm-exporter:
    image: nvcr.io/nvidia/k8s/dcgm-exporter:3.3.0-3.2.0-ubuntu22.04
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      DCGM_EXPORTER_LISTEN: ":9400"
    ports:
      - "9400:9400"
    networks:
      - openclaw-network
    restart: unless-stopped

Add to prometheus.yml:

scrape_configs:
  - job_name: "gpu-metrics"
    static_configs:
      - targets: ["dcgm-exporter:9400"]

Cloud GPU providers

For cloud deployments with GPU support:

Provider	GPU Options	Notes
AWS EC2	P3, P4, G4, G5 instances	Use Ubuntu Deep Learning AMI
Google Cloud	A2, N1 with Tesla T4/V100	Pre-installed NVIDIA drivers
Azure	NC, ND, NV series	Container-optimized VM images
Vast.ai	Various consumer/datacenter GPUs	Pre-configured with Docker + NVIDIA toolkit
RunPod	RTX 3090, A40, A100	Docker and GPU support included

When deploying to cloud VMs, the NVIDIA drivers and Container Toolkit are often pre-installed. Verify with nvidia-smi before installing.

Get Started

Core Concepts

CLI Guide

Configuration

Deployment

Guides

Prerequisites

NVIDIA GPU

NVIDIA Container Toolkit

Verify installation

Enabling GPU passthrough

GPU-enabled services

Required GPU

Optional GPU

Docker Compose configuration

Limiting GPU access

Verifying GPU usage

Check container GPU access

Monitor GPU usage

Ollama GPU usage

Performance optimization

VRAM allocation

Multi-GPU configuration

Compute mode

Troubleshooting

GPU not detected

Out of memory errors

Driver/CUDA version mismatch

Container can’t access GPU

Monitoring with Grafana

Cloud GPU providers

Build docs developers (and LLMs) love

Get Started

Core Concepts

CLI Guide

Configuration

Deployment

Guides

​Prerequisites

​NVIDIA GPU

​NVIDIA Container Toolkit

​Verify installation

​Enabling GPU passthrough

​GPU-enabled services

​Required GPU

​Optional GPU

​Docker Compose configuration

​Limiting GPU access

​Verifying GPU usage

​Check container GPU access

​Monitor GPU usage

​Ollama GPU usage

​Performance optimization

​VRAM allocation

​Multi-GPU configuration

​Compute mode

​Troubleshooting

​GPU not detected

​Out of memory errors

​Driver/CUDA version mismatch

​Container can’t access GPU

​Monitoring with Grafana

​Cloud GPU providers

Build docs developers (and LLMs) love

Prerequisites

NVIDIA GPU

NVIDIA Container Toolkit

Verify installation

Enabling GPU passthrough

GPU-enabled services

Required GPU

Optional GPU

Docker Compose configuration

Limiting GPU access

Verifying GPU usage

Check container GPU access

Monitor GPU usage

Ollama GPU usage

Performance optimization

VRAM allocation

Multi-GPU configuration

Compute mode

Troubleshooting

GPU not detected

Out of memory errors

Driver/CUDA version mismatch

Container can’t access GPU

Monitoring with Grafana

Cloud GPU providers