Prerequisites
Before you begin, ensure you have:Hardware Requirements
Hardware Requirements
- GPU: CUDA-capable GPU with 16GB+ VRAM
- Architecture: x86_64 only (no aarch64 support)
- Single GPU is sufficient for basic setup
- Multi-GPU setup recommended for production (see below)
Operating System
Operating System
- Linux: Any modern distribution
- Windows: WSL 2 (installation guide)
- macOS: Not supported (issue #74)
- Windows Native: Not supported (issue #84)
Software Requirements
Software Requirements
- Docker Compose (install guide)
- NVIDIA Container Toolkit (install guide)
Step 1: Verify NVIDIA Container Toolkit
Confirm your GPU is accessible to Docker:Expected output
Expected output
You should see your GPU information displayed, including:
- GPU name and driver version
- Memory usage and total VRAM
- CUDA version
Step 2: Get Hugging Face Access
Unmute uses open-weight models from Hugging Face. You’ll need a token to download them.
Create Hugging Face Account
Sign up at huggingface.co if you don’t have an account
Accept Model License
By default, Unmute uses Llama 3.2 1B Instruct. Visit the model page and accept the license terms.
Generate Access Token
Create a token with these settings:
- Type: Fine-grained
- Permission: Read access to contents of all public gated repos you can access
Step 3: Clone Repository
Step 4: Configure Memory (Optional)
The default
docker-compose.yml uses Llama 3.2 1B which requires 16GB VRAM. If you have memory issues, adjust these settings:Step 5: Launch Unmute
Start all services with a single command:First Run (10-15 minutes)
Docker will:
- Build container images
- Download models from Hugging Face (~8GB)
- Initialize services
Models are cached in
./volumes/hf-cache/ so subsequent starts are much faster (30-60 seconds)Access Unmute
Open your browser to:http://localhost:80
Step 6: Start Talking
Select a Character
Choose from the available voices and personalities:
- Watercooler: Casual small talk
- Quiz show: Interactive trivia
- Gertrude: Life advice and sympathy
- More voices available in
voices.yaml
Multi-GPU Configuration
Running STT, TTS, and LLM on separate GPUs reduces TTS latency from ~750ms to ~450ms.
docker-compose.yml to assign dedicated GPUs:
docker-compose.yml
Remote Access via SSH
Docker Compose (Port 80)
Docker Compose (Port 80)
Forward port 80 from remote to local port 3333:Then access via http://localhost:3333
Browsers require localhost or HTTPS for microphone access. Direct HTTP access to
http://unmute-box:80 won’t work.Why Port Forwarding is Required
Why Port Forwarding is Required
Modern browsers block microphone access over HTTP unless:
- The origin is
localhostor127.0.0.1 - The connection uses HTTPS
localhost to your browser.Common Issues
Out of Memory Errors
Out of Memory Errors
Reduce memory usage in Or increase batch sizes for TTS/STT to share memory better (higher latency).
docker-compose.yml:Models Not Downloading
Models Not Downloading
Check your Hugging Face token:Verify you accepted the model license on Hugging Face.
Port Already in Use
Port Already in Use
Change the port in
docker-compose.yml:WebSocket Connection Failed
WebSocket Connection Failed
Ensure all services are running:Check backend logs:
Next Steps
Customize Voices
Add custom voices and modify character personalities
Use External LLMs
Connect to OpenAI, Ollama, or other LLM providers
Production Deployment
Scale Unmute with Docker Swarm for production workloads
Development Guide
Contribute to Unmute or build custom frontends