Hardware Requirements
GPU Requirements
- Single GPU (Minimum)
- Multi-GPU (Recommended)
- Production (Swarm)
VRAM: 16GB minimumExample GPUs:
- NVIDIA RTX 4090 (24GB)
- NVIDIA RTX 3090 (24GB)
- NVIDIA L40S (48GB)
- NVIDIA A100 (40GB/80GB)
- NVIDIA RTX A6000 (48GB)
- STT: 2.5GB VRAM
- TTS: 5.3GB VRAM
- LLM: 6.1GB+ VRAM (model dependent)
- Overhead: ~2GB for CUDA and buffers
With 16GB VRAM, use Llama 3.2 1B with
--gpu-memory-utilization=0.4 and --max-model-len=1536Architecture Requirements
x86_64 Only
Unmute is built for x86_64 (AMD64) architecture.Not Supported:
- ARM64 (aarch64) - No support planned
- Apple Silicon (M1/M2/M3) - No support planned
System Memory
Recommended RAM: 16GB+ system memoryWhile models run on GPU, the host needs memory for:
- Docker containers and Python processes
- Model loading and initialization
- Audio buffering and WebSocket connections
Software Requirements
Operating System
- Linux (Recommended)
- Windows (WSL)
- macOS (Not Supported)
Supported:
- Ubuntu 20.04+
- Debian 11+
- Fedora 36+
- Arch Linux
- Any modern Linux distribution with Docker support
- Best NVIDIA driver support
- Native Docker integration
- Used by Kyutai for development and production
Docker (Docker Compose)
Install Docker
Follow the official Docker installation guide for your platform.Linux (Ubuntu/Debian):Windows: Install Docker Desktop with WSL 2 backend
Install Docker Compose
Docker Compose is included with Docker Desktop. On Linux:Verify installation:
Install NVIDIA Container Toolkit
Dockerless Setup (Alternative)
Software Requirements for Dockerless Deployment
Software Requirements for Dockerless Deployment
If you prefer to run services without Docker:Required Tools:Access at http://localhost:3000
This is more complex and requires manual dependency management. Docker Compose is recommended.
-
uv: Python package manager
-
cargo: Rust toolchain (for STT/TTS servers)
-
pnpm: Node package manager (for frontend)
-
CUDA 12.1: For Rust processes
- Install via conda or from NVIDIA website
Configuration Requirements
Hugging Face Access
Model Access Token
Unmute downloads models from Hugging Face Hub:Required:
- Hugging Face account
- Accept licenses for models you’ll use:
- Llama 3.2 1B Instruct (default)
- Mistral Small 3.2 24B (recommended)
- Gemma 3 12B (alternative)
- Generate access token with read access
- Set environment variable:
Network Requirements
Ports
Docker Compose:
- Port 80: Traefik (HTTP traffic)
- Port 3000: Frontend
- Port 8000: Backend WebSocket
- Port 9090: Prometheus metrics
- Port 3001: Grafana dashboards
Bandwidth
Per User:
- Audio upstream: ~16 KB/s
- Audio downstream: ~16 KB/s
- Total: ~32 KB/s bidirectional
- ~320 KB/s (~2.5 Mbps)
Browser Requirements
WebRTC & WebSocket Support
Recommended Browsers:
- Chrome 90+
- Firefox 88+
- Edge 90+
- Safari 14+ (requires HTTPS)
- WebSocket support
- WebRTC (for optional WebRTC mode)
- Microphone access (requires HTTPS or localhost)
- Web Audio API
Modern browsers require HTTPS or localhost for microphone access. Use SSH port forwarding for remote access over HTTP.
Model Requirements
Default Models
- Speech-to-Text
- Text-to-Speech
- Language Model
Model: Kyutai STT 1B (English/French)Specifications:
- Size: ~2GB download
- VRAM: 2.5GB
- Languages: English, French
- Architecture: Transformer (16 layers, 2048 d_model)
- Latency: 6-token delay (~200ms)
stt.toml):Performance Targets
Latency
Single GPU (L40S):
- STT: ~200ms
- LLM: ~500ms (model dependent)
- TTS: ~750ms
- Total: ~1450ms
- STT: ~200ms
- LLM: ~500ms
- TTS: ~450ms
- Total: ~1150ms
Throughput
Per Backend Instance:
- Max concurrent users: 4
- Limited by Python GIL
- Run multiple backend replicas
- Each replica handles 4 users
- Load balance with Traefik
- Example: 10 replicas = 40 users
Optional Components
NewsAPI (for News Character)
NewsAPI (for News Character)
The “Dev (news)” character requires a NewsAPI key:
- Sign up at newsapi.org
- Get your free API key
- Add to environment:
Redis (for Service Discovery)
Redis (for Service Discovery)
Optional for Docker Swarm deployments:
- Used for service registration and health checks
- Required for multi-node setups
- Not needed for single-machine Docker Compose
Prometheus + Grafana (Monitoring)
Prometheus + Grafana (Monitoring)
Optional monitoring stack:
- Prometheus: Metrics collection
- Grafana: Visualization dashboards
- Pre-configured for Unmute metrics
- Included in Docker Swarm setup
services/prometheus/ and services/grafana/Deployment Comparison
Choose the deployment method that matches your resources and use case:
| Method | GPUs | Machines | Difficulty | Kyutai Support | Best For |
|---|---|---|---|---|---|
| Docker Compose | 1+ | 1 | Very Easy | ✅ Full | Development, testing, single-user |
| Dockerless | 1-3 | 1-5 | Easy | ✅ Full | Custom setups, debugging |
| Docker Swarm | 1-100 | 1-100 | Medium | ❌ None | Production, scaling, unmute.sh |
Start with Docker Compose
We strongly recommend starting with Docker Compose:
- Fastest setup (5-10 minutes)
- Fully supported by Kyutai team
- Easy to troubleshoot
- Perfect for learning and development
- Fine-grained control (Dockerless)
- Multi-machine scaling (Docker Swarm)
Ready to Start?
Quick Start Guide
Follow step-by-step instructions to get Unmute running
Join the Community
Star the repo, report issues, and contribute