Enable GPU acceleration for AI services that support NVIDIA GPUs. This significantly improves inference speed for local models.
Prerequisites
NVIDIA GPU
Verify you have a compatible NVIDIA GPU:
# Check GPU
lspci | grep -i nvidia
# Verify NVIDIA driver
nvidia-smi
Expected output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVIDIA RTX 4090 Off | 00000000:01:00.0 Off | N/A |
| 30% 45C P8 25W / 450W | 1MiB / 24564MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Install the NVIDIA Container Toolkit to enable GPU access in Docker containers.
Ubuntu/Debian
RHEL/CentOS/Fedora
# Add NVIDIA package repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
# Restart Docker
sudo systemctl restart docker
# Add NVIDIA package repository
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# Install toolkit
sudo dnf install -y nvidia-container-toolkit
# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
# Restart Docker
sudo systemctl restart docker
Verify installation
Test GPU access from Docker:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
If successful, you’ll see the nvidia-smi output inside the container.
Enabling GPU passthrough
Generate a stack with GPU support:
npx create-better-openclaw
# Select "Enable GPU passthrough" when prompted
The --gpu flag automatically adds GPU device reservations to services that support it.
GPU-enabled services
Required GPU
These services require a GPU to function:
| Service | Description | Memory Required |
|---|
| Stable Diffusion | AI image generation | ~4 GB VRAM |
Optional GPU
These services work without GPU but benefit from acceleration:
| Service | Description | GPU Benefit |
|---|
| Ollama | Local LLM inference | 5-10x faster inference |
| Whisper | Speech-to-text | 3-5x faster transcription |
| ComfyUI | Node-based AI workflows | Faster image generation |
Docker Compose configuration
When you enable --gpu, better-openclaw adds GPU device reservations to docker-compose.yml:
services:
ollama:
image: ollama/ollama:0.17.0
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
# ... rest of service config
stable-diffusion:
image: ghcr.io/stable-diffusion-webui/stable-diffusion-webui:latest-cuda
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
# ... rest of service config
Limiting GPU access
To restrict GPU access to specific devices:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0'] # Only GPU 0
capabilities: [gpu]
Or limit by count:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1 # Only 1 GPU
capabilities: [gpu]
Verifying GPU usage
Check container GPU access
# Enter Ollama container
docker compose exec ollama bash
# Check GPU visibility
nvidia-smi
Monitor GPU usage
Watch real-time GPU metrics:
# Continuous monitoring
watch -n 1 nvidia-smi
# Or with better formatting
nvidia-smi dmon -s pucvmet
Ollama GPU usage
When running models, Ollama displays GPU information:
docker compose exec ollama ollama run llama3.2
# Output shows GPU memory allocation:
# Loading model... 100%
# Model loaded on GPU 0 (8.2 GB / 24 GB VRAM)
VRAM allocation
Allocate appropriate VRAM based on model size:
| Model Size | Minimum VRAM | Recommended VRAM |
|---|
| 7B params | 6 GB | 8 GB |
| 13B params | 12 GB | 16 GB |
| 34B params | 24 GB | 32 GB |
| 70B params | 48 GB | 64 GB |
Multi-GPU configuration
For systems with multiple GPUs, better-openclaw enables all GPUs by default (count: all). To distribute services across GPUs:
- Manually edit
docker-compose.yml to assign specific device IDs
- Use Docker Compose profiles to start services independently
# GPU 0 for Ollama
ollama:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [gpu]
# GPU 1 for Stable Diffusion
stable-diffusion:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['1']
capabilities: [gpu]
Compute mode
Set GPU compute mode for exclusive access (recommended for production):
# Set exclusive process mode (one context per GPU)
sudo nvidia-smi -c EXCLUSIVE_PROCESS
# Or default shared mode (multiple contexts)
sudo nvidia-smi -c DEFAULT
Troubleshooting
GPU not detected
Check NVIDIA driver:
nvidia-smi
# If this fails, reinstall NVIDIA drivers
Verify Container Toolkit:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Check Docker daemon config:
cat /etc/docker/daemon.json
Should contain:
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Out of memory errors
RuntimeError: CUDA out of memory
Solutions:
-
Use smaller models:
# Instead of llama3:70b, use llama3.2:7b
ollama pull llama3.2:7b
-
Enable model quantization:
# Use 4-bit quantized models
ollama pull llama3.2:7b-q4_0
-
Reduce batch size or context length in service config
Driver/CUDA version mismatch
Error: CUDA driver version is insufficient for CUDA runtime version
Fix:
# Update NVIDIA driver
sudo ubuntu-drivers autoinstall
sudo reboot
# Or install specific version
sudo apt install nvidia-driver-535
Container can’t access GPU
failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
Fix:
# Reinstall NVIDIA Container Toolkit
sudo apt-get purge nvidia-container-toolkit
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Restart stack
docker compose down
docker compose up -d
Monitoring with Grafana
Add GPU metrics to your monitoring stack:
npx create-better-openclaw \
--services ollama,prometheus,grafana \
--gpu \
--monitoring \
--yes
Install NVIDIA DCGM Exporter for GPU metrics:
services:
dcgm-exporter:
image: nvcr.io/nvidia/k8s/dcgm-exporter:3.3.0-3.2.0-ubuntu22.04
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
DCGM_EXPORTER_LISTEN: ":9400"
ports:
- "9400:9400"
networks:
- openclaw-network
restart: unless-stopped
Add to prometheus.yml:
scrape_configs:
- job_name: "gpu-metrics"
static_configs:
- targets: ["dcgm-exporter:9400"]
Cloud GPU providers
For cloud deployments with GPU support:
| Provider | GPU Options | Notes |
|---|
| AWS EC2 | P3, P4, G4, G5 instances | Use Ubuntu Deep Learning AMI |
| Google Cloud | A2, N1 with Tesla T4/V100 | Pre-installed NVIDIA drivers |
| Azure | NC, ND, NV series | Container-optimized VM images |
| Vast.ai | Various consumer/datacenter GPUs | Pre-configured with Docker + NVIDIA toolkit |
| RunPod | RTX 3090, A40, A100 | Docker and GPU support included |
When deploying to cloud VMs, the NVIDIA drivers and Container Toolkit are often pre-installed. Verify with nvidia-smi before installing.