Hardware Requirements
Minimum Requirements
- GPU: NVIDIA GPU with CUDA support
- VRAM: At least 16 GB
- Architecture: x86_64 (aarch64 is not supported)
Memory Usage by Service
When running all services, approximate VRAM usage:| Service | VRAM Required |
|---|---|
| Speech-to-text (STT) | 2.5 GB |
| Text-to-speech (TTS) | 5.3 GB |
| LLM (Llama 3.2 1B) | 6.1 GB |
| Total | ~14 GB |
The default
docker-compose.yml uses Llama 3.2 1B which fits in 16GB VRAM. If using larger models like Mistral Small 3.2 24B, you’ll need more VRAM.Operating System Support
Supported platforms:- Linux (native)
- Windows with WSL 2
Docker Setup
Install NVIDIA Container Toolkit
The NVIDIA Container Toolkit allows Docker containers to access your GPU.Install the Container Toolkit
Configure GPU Access in Docker Compose
Thedocker-compose.yml file configures GPU access for the AI services:
Multi-GPU Configuration
Running services on separate GPUs significantly improves latency. On unmute.sh, TTS latency decreases from ~750ms (single L40S GPU) to ~450ms (multi-GPU setup).Single GPU Setup (Default)
By default, all services share GPU(s) usingcount: all:
Dedicated GPU per Service
If you have 3+ GPUs, assign one GPU to each service for optimal performance:Docker will automatically distribute services across available GPUs when using
count: 1. You don’t need to manually specify device IDs.Memory Optimization
If you’re running out of GPU memory, adjust these settings indocker-compose.yml:
LLM Memory Settings
Maximum context length for the LLM. Lower values use less memory but support shorter conversations.
Percentage of GPU memory to allocate (0.0-1.0). Lower values leave more memory for other services.
Switch to a Smaller Model
Use a smaller LLM model:meta-llama/Llama-3.2-1B-Instruct- ~6 GB VRAMgoogle/gemma-3-1b-it- ~6 GB VRAM (note: slower on vLLM)google/gemma-3-12b-it- ~12 GB VRAMmistralai/Mistral-Small-3.2-24B-Instruct-2506- ~24 GB VRAM
Dockerless Setup
For dockerless deployment, ensure CUDA 12.1+ is installed:Install CUDA
Install CUDA 12.1 or later:
- Via conda:
conda install cuda -c nvidia/label/cuda-12.1.0 - Or download from NVIDIA’s website
Troubleshooting
Docker can't access GPU
Docker can't access GPU
If
nvidia-smi works but Docker can’t access the GPU:- Verify NVIDIA Container Toolkit is installed
- Restart Docker:
sudo systemctl restart docker - Check Docker runtime:
docker info | grep -i runtime - Try the verification command again:
Out of memory errors
Out of memory errors
If services crash with OOM errors:
- Check VRAM usage:
nvidia-smi - Reduce
--gpu-memory-utilizationfor the LLM - Lower
--max-model-lenfor shorter conversations - Use a smaller LLM model
- Stop other GPU-intensive applications
WSL GPU issues
WSL GPU issues
For Windows WSL users:
- Ensure you’re using WSL 2 (not WSL 1)
- Update to the latest NVIDIA driver for Windows
- Install NVIDIA CUDA on WSL following Microsoft’s guide
- Don’t install NVIDIA drivers inside WSL - use the Windows driver