Skip to main content

Hardware Requirements

GPU Requirements

Unmute requires a CUDA-capable NVIDIA GPU. CPU-only deployment is not supported.
VRAM: 16GB minimumExample GPUs:
  • NVIDIA RTX 4090 (24GB)
  • NVIDIA RTX 3090 (24GB)
  • NVIDIA L40S (48GB)
  • NVIDIA A100 (40GB/80GB)
  • NVIDIA RTX A6000 (48GB)
Memory Breakdown:
  • STT: 2.5GB VRAM
  • TTS: 5.3GB VRAM
  • LLM: 6.1GB+ VRAM (model dependent)
  • Overhead: ~2GB for CUDA and buffers
With 16GB VRAM, use Llama 3.2 1B with --gpu-memory-utilization=0.4 and --max-model-len=1536

Architecture Requirements

x86_64 Only

Unmute is built for x86_64 (AMD64) architecture.Not Supported:
  • ARM64 (aarch64) - No support planned
  • Apple Silicon (M1/M2/M3) - No support planned
This is due to dependencies on CUDA and compiled Rust binaries.

System Memory

Recommended RAM: 16GB+ system memoryWhile models run on GPU, the host needs memory for:
  • Docker containers and Python processes
  • Model loading and initialization
  • Audio buffering and WebSocket connections

Software Requirements

Operating System

Docker (Docker Compose)

1

Install Docker

Follow the official Docker installation guide for your platform.Linux (Ubuntu/Debian):
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
Windows: Install Docker Desktop with WSL 2 backend
2

Install Docker Compose

Docker Compose is included with Docker Desktop. On Linux:
sudo apt-get install docker-compose-plugin
Verify installation:
docker compose version
3

Install NVIDIA Container Toolkit

Required for GPU access from Docker containers:
# Ubuntu/Debian
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
See NVIDIA’s official guide for other distributions.
4

Verify GPU Access

Test that Docker can access your GPU:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03   Driver Version: 535.129.03   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA L40S         Off  | 00000000:01:00.0 Off |                    0 |
| N/A   32C    P0    35W / 350W |      0MiB / 46068MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Dockerless Setup (Alternative)

If you prefer to run services without Docker:
This is more complex and requires manual dependency management. Docker Compose is recommended.
Required Tools:
  • uv: Python package manager
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  • cargo: Rust toolchain (for STT/TTS servers)
    curl https://sh.rustup.rs -sSf | sh
    
  • pnpm: Node package manager (for frontend)
    curl -fsSL https://get.pnpm.io/install.sh | sh -
    
  • CUDA 12.1: For Rust processes
Start Services:
# In separate terminals or tmux sessions
./dockerless/start_frontend.sh  # Port 3000
./dockerless/start_backend.sh   # Port 8000
./dockerless/start_llm.sh       # Needs 6.1GB VRAM
./dockerless/start_stt.sh       # Needs 2.5GB VRAM
./dockerless/start_tts.sh       # Needs 5.3GB VRAM
Access at http://localhost:3000

Configuration Requirements

Hugging Face Access

Model Access Token

Unmute downloads models from Hugging Face Hub:Required:
  1. Hugging Face account
  2. Accept licenses for models you’ll use:
  3. Generate access token with read access
  4. Set environment variable:
    export HUGGING_FACE_HUB_TOKEN=hf_your_token_here
    
Security: Never use tokens with write access in production deployments

Network Requirements

Ports

Docker Compose:
  • Port 80: Traefik (HTTP traffic)
Dockerless:
  • Port 3000: Frontend
  • Port 8000: Backend WebSocket
Optional:
  • Port 9090: Prometheus metrics
  • Port 3001: Grafana dashboards

Bandwidth

Per User:
  • Audio upstream: ~16 KB/s
  • Audio downstream: ~16 KB/s
  • Total: ~32 KB/s bidirectional
For 10 concurrent users:
  • ~320 KB/s (~2.5 Mbps)

Browser Requirements

WebRTC & WebSocket Support

Recommended Browsers:
  • Chrome 90+
  • Firefox 88+
  • Edge 90+
  • Safari 14+ (requires HTTPS)
Required Features:
  • WebSocket support
  • WebRTC (for optional WebRTC mode)
  • Microphone access (requires HTTPS or localhost)
  • Web Audio API
Modern browsers require HTTPS or localhost for microphone access. Use SSH port forwarding for remote access over HTTP.

Model Requirements

Default Models

Model: Kyutai STT 1B (English/French)Specifications:
  • Size: ~2GB download
  • VRAM: 2.5GB
  • Languages: English, French
  • Architecture: Transformer (16 layers, 2048 d_model)
  • Latency: 6-token delay (~200ms)
Configuration (stt.toml):
lm_model_file = "hf://kyutai/stt-1b-en_fr-candle/model.safetensors"
text_tokenizer_file = "hf://kyutai/stt-1b-en_fr-candle/tokenizer_en_fr_audio_8000.model"
audio_tokenizer_file = "hf://kyutai/stt-1b-en_fr-candle/[email protected]"
batch_size = 1
temperature = 0.25

Performance Targets

Latency

Single GPU (L40S):
  • STT: ~200ms
  • LLM: ~500ms (model dependent)
  • TTS: ~750ms
  • Total: ~1450ms
Multi-GPU:
  • STT: ~200ms
  • LLM: ~500ms
  • TTS: ~450ms
  • Total: ~1150ms

Throughput

Per Backend Instance:
  • Max concurrent users: 4
  • Limited by Python GIL
Scaling Strategy:
  • Run multiple backend replicas
  • Each replica handles 4 users
  • Load balance with Traefik
  • Example: 10 replicas = 40 users

Optional Components

The “Dev (news)” character requires a NewsAPI key:
  1. Sign up at newsapi.org
  2. Get your free API key
  3. Add to environment:
    export NEWSAPI_API_KEY=your_key_here
    
Without this, the news character won’t have current topics.
Optional for Docker Swarm deployments:
  • Used for service registration and health checks
  • Required for multi-node setups
  • Not needed for single-machine Docker Compose
Optional monitoring stack:
  • Prometheus: Metrics collection
  • Grafana: Visualization dashboards
  • Pre-configured for Unmute metrics
  • Included in Docker Swarm setup
See services/prometheus/ and services/grafana/

Deployment Comparison

Choose the deployment method that matches your resources and use case:
MethodGPUsMachinesDifficultyKyutai SupportBest For
Docker Compose1+1Very Easy✅ FullDevelopment, testing, single-user
Dockerless1-31-5Easy✅ FullCustom setups, debugging
Docker Swarm1-1001-100Medium❌ NoneProduction, scaling, unmute.sh

Start with Docker Compose

We strongly recommend starting with Docker Compose:
  • Fastest setup (5-10 minutes)
  • Fully supported by Kyutai team
  • Easy to troubleshoot
  • Perfect for learning and development
Switch to other methods only when you need:
  • Fine-grained control (Dockerless)
  • Multi-machine scaling (Docker Swarm)

Ready to Start?

Quick Start Guide

Follow step-by-step instructions to get Unmute running

Join the Community

Star the repo, report issues, and contribute

Build docs developers (and LLMs) love