Skip to main content
Enable GPU acceleration for AI services that support NVIDIA GPUs. This significantly improves inference speed for local models.

Prerequisites

NVIDIA GPU

Verify you have a compatible NVIDIA GPU:
# Check GPU
lspci | grep -i nvidia

# Verify NVIDIA driver
nvidia-smi
Expected output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03   Driver Version: 535.129.03   CUDA Version: 12.2   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA RTX 4090     Off  | 00000000:01:00.0 Off |                  N/A |
| 30%   45C    P8    25W / 450W |      1MiB / 24564MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

NVIDIA Container Toolkit

Install the NVIDIA Container Toolkit to enable GPU access in Docker containers.
# Add NVIDIA package repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Restart Docker
sudo systemctl restart docker

Verify installation

Test GPU access from Docker:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
If successful, you’ll see the nvidia-smi output inside the container.

Enabling GPU passthrough

Generate a stack with GPU support:
npx create-better-openclaw
# Select "Enable GPU passthrough" when prompted
The --gpu flag automatically adds GPU device reservations to services that support it.

GPU-enabled services

Required GPU

These services require a GPU to function:
ServiceDescriptionMemory Required
Stable DiffusionAI image generation~4 GB VRAM

Optional GPU

These services work without GPU but benefit from acceleration:
ServiceDescriptionGPU Benefit
OllamaLocal LLM inference5-10x faster inference
WhisperSpeech-to-text3-5x faster transcription
ComfyUINode-based AI workflowsFaster image generation

Docker Compose configuration

When you enable --gpu, better-openclaw adds GPU device reservations to docker-compose.yml:
docker-compose.yml
services:
  ollama:
    image: ollama/ollama:0.17.0
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    # ... rest of service config

  stable-diffusion:
    image: ghcr.io/stable-diffusion-webui/stable-diffusion-webui:latest-cuda
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    # ... rest of service config

Limiting GPU access

To restrict GPU access to specific devices:
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          device_ids: ['0']  # Only GPU 0
          capabilities: [gpu]
Or limit by count:
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1  # Only 1 GPU
          capabilities: [gpu]

Verifying GPU usage

Check container GPU access

# Enter Ollama container
docker compose exec ollama bash

# Check GPU visibility
nvidia-smi

Monitor GPU usage

Watch real-time GPU metrics:
# Continuous monitoring
watch -n 1 nvidia-smi

# Or with better formatting
nvidia-smi dmon -s pucvmet

Ollama GPU usage

When running models, Ollama displays GPU information:
docker compose exec ollama ollama run llama3.2

# Output shows GPU memory allocation:
# Loading model... 100%
# Model loaded on GPU 0 (8.2 GB / 24 GB VRAM)

Performance optimization

VRAM allocation

Allocate appropriate VRAM based on model size:
Model SizeMinimum VRAMRecommended VRAM
7B params6 GB8 GB
13B params12 GB16 GB
34B params24 GB32 GB
70B params48 GB64 GB

Multi-GPU configuration

For systems with multiple GPUs, better-openclaw enables all GPUs by default (count: all). To distribute services across GPUs:
  1. Manually edit docker-compose.yml to assign specific device IDs
  2. Use Docker Compose profiles to start services independently
# GPU 0 for Ollama
ollama:
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]

# GPU 1 for Stable Diffusion
stable-diffusion:
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            device_ids: ['1']
            capabilities: [gpu]

Compute mode

Set GPU compute mode for exclusive access (recommended for production):
# Set exclusive process mode (one context per GPU)
sudo nvidia-smi -c EXCLUSIVE_PROCESS

# Or default shared mode (multiple contexts)
sudo nvidia-smi -c DEFAULT

Troubleshooting

GPU not detected

Check NVIDIA driver:
nvidia-smi
# If this fails, reinstall NVIDIA drivers
Verify Container Toolkit:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Check Docker daemon config:
cat /etc/docker/daemon.json
Should contain:
{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

Out of memory errors

RuntimeError: CUDA out of memory
Solutions:
  1. Use smaller models:
    # Instead of llama3:70b, use llama3.2:7b
    ollama pull llama3.2:7b
    
  2. Enable model quantization:
    # Use 4-bit quantized models
    ollama pull llama3.2:7b-q4_0
    
  3. Reduce batch size or context length in service config

Driver/CUDA version mismatch

Error: CUDA driver version is insufficient for CUDA runtime version
Fix:
# Update NVIDIA driver
sudo ubuntu-drivers autoinstall
sudo reboot

# Or install specific version
sudo apt install nvidia-driver-535

Container can’t access GPU

failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
Fix:
# Reinstall NVIDIA Container Toolkit
sudo apt-get purge nvidia-container-toolkit
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Restart stack
docker compose down
docker compose up -d

Monitoring with Grafana

Add GPU metrics to your monitoring stack:
npx create-better-openclaw \
  --services ollama,prometheus,grafana \
  --gpu \
  --monitoring \
  --yes
Install NVIDIA DCGM Exporter for GPU metrics:
docker-compose.yml
services:
  dcgm-exporter:
    image: nvcr.io/nvidia/k8s/dcgm-exporter:3.3.0-3.2.0-ubuntu22.04
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      DCGM_EXPORTER_LISTEN: ":9400"
    ports:
      - "9400:9400"
    networks:
      - openclaw-network
    restart: unless-stopped
Add to prometheus.yml:
scrape_configs:
  - job_name: "gpu-metrics"
    static_configs:
      - targets: ["dcgm-exporter:9400"]

Cloud GPU providers

For cloud deployments with GPU support:
ProviderGPU OptionsNotes
AWS EC2P3, P4, G4, G5 instancesUse Ubuntu Deep Learning AMI
Google CloudA2, N1 with Tesla T4/V100Pre-installed NVIDIA drivers
AzureNC, ND, NV seriesContainer-optimized VM images
Vast.aiVarious consumer/datacenter GPUsPre-configured with Docker + NVIDIA toolkit
RunPodRTX 3090, A40, A100Docker and GPU support included
When deploying to cloud VMs, the NVIDIA drivers and Container Toolkit are often pre-installed. Verify with nvidia-smi before installing.

Build docs developers (and LLMs) love