Requirements

Hardware Requirements

GPU Requirements

Unmute requires a CUDA-capable NVIDIA GPU. CPU-only deployment is not supported.

Single GPU (Minimum)
Multi-GPU (Recommended)
Production (Swarm)

VRAM: 16GB minimumExample GPUs:

NVIDIA RTX 4090 (24GB)
NVIDIA RTX 3090 (24GB)
NVIDIA L40S (48GB)
NVIDIA A100 (40GB/80GB)
NVIDIA RTX A6000 (48GB)

Memory Breakdown:

STT: 2.5GB VRAM
TTS: 5.3GB VRAM
LLM: 6.1GB+ VRAM (model dependent)
Overhead: ~2GB for CUDA and buffers

With 16GB VRAM, use Llama 3.2 1B with --gpu-memory-utilization=0.4 and --max-model-len=1536

Architecture Requirements

x86_64 Only

Unmute is built for x86_64 (AMD64) architecture.Not Supported:

ARM64 (aarch64) - No support planned
Apple Silicon (M1/M2/M3) - No support planned

This is due to dependencies on CUDA and compiled Rust binaries.

System Memory

Recommended RAM: 16GB+ system memoryWhile models run on GPU, the host needs memory for:

Docker containers and Python processes
Model loading and initialization
Audio buffering and WebSocket connections

Software Requirements

Operating System

Linux (Recommended)
Windows (WSL)
macOS (Not Supported)

Supported:

Ubuntu 20.04+
Debian 11+
Fedora 36+
Arch Linux
Any modern Linux distribution with Docker support

Why Linux?

Best NVIDIA driver support
Native Docker integration
Used by Kyutai for development and production

Requirements:

Windows 10 version 2004+ (Build 19041+) or Windows 11
WSL 2 (installation guide)
NVIDIA GPU drivers for Windows
CUDA support in WSL

Setup Steps:

# Install WSL 2 with Ubuntu
wsl --install -d Ubuntu

# Install Docker Desktop for Windows with WSL 2 backend
# Download from docker.com

# Verify GPU access in WSL
wsl
nvidia-smi

Native Windows Not Supported: Unmute cannot run directly on Windows without WSL (issue #84)

Docker (Docker Compose)

Install Docker

Follow the official Docker installation guide for your platform.Linux (Ubuntu/Debian):

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER

Windows: Install Docker Desktop with WSL 2 backend

Install Docker Compose

Docker Compose is included with Docker Desktop. On Linux:

sudo apt-get install docker-compose-plugin

Verify installation:

docker compose version

Install NVIDIA Container Toolkit

Required for GPU access from Docker containers:

# Ubuntu/Debian
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

See NVIDIA’s official guide for other distributions.

Verify GPU Access

Test that Docker can access your GPU:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Expected output

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03   Driver Version: 535.129.03   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA L40S         Off  | 00000000:01:00.0 Off |                    0 |
| N/A   32C    P0    35W / 350W |      0MiB / 46068MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Dockerless Setup (Alternative)

Software Requirements for Dockerless Deployment

If you prefer to run services without Docker:

This is more complex and requires manual dependency management. Docker Compose is recommended.

Required Tools:

uv: Python package manager

curl -LsSf https://astral.sh/uv/install.sh | sh

cargo: Rust toolchain (for STT/TTS servers)
```
curl https://sh.rustup.rs -sSf | sh
```

pnpm: Node package manager (for frontend)

curl -fsSL https://get.pnpm.io/install.sh | sh -

CUDA 12.1: For Rust processes
- Install via conda or from NVIDIA website

Start Services:

# In separate terminals or tmux sessions
./dockerless/start_frontend.sh  # Port 3000
./dockerless/start_backend.sh   # Port 8000
./dockerless/start_llm.sh       # Needs 6.1GB VRAM
./dockerless/start_stt.sh       # Needs 2.5GB VRAM
./dockerless/start_tts.sh       # Needs 5.3GB VRAM

Access at http://localhost:3000

Configuration Requirements

Hugging Face Access

Model Access Token

Unmute downloads models from Hugging Face Hub:Required:

Hugging Face account
Accept licenses for models you’ll use:
- Llama 3.2 1B Instruct (default)
- Mistral Small 3.2 24B (recommended)
- Gemma 3 12B (alternative)
Generate access token with read access

Set environment variable:

export HUGGING_FACE_HUB_TOKEN=hf_your_token_here

Security: Never use tokens with write access in production deployments

Network Requirements

Ports

Docker Compose:

Port 80: Traefik (HTTP traffic)

Dockerless:

Port 3000: Frontend
Port 8000: Backend WebSocket

Optional:

Port 9090: Prometheus metrics
Port 3001: Grafana dashboards

Bandwidth

Per User:

Audio upstream: ~16 KB/s
Audio downstream: ~16 KB/s
Total: ~32 KB/s bidirectional

For 10 concurrent users:

~320 KB/s (~2.5 Mbps)

Browser Requirements

WebRTC & WebSocket Support

Recommended Browsers:

Chrome 90+
Firefox 88+
Edge 90+
Safari 14+ (requires HTTPS)

Required Features:

WebSocket support
WebRTC (for optional WebRTC mode)
Microphone access (requires HTTPS or localhost)
Web Audio API

Modern browsers require HTTPS or localhost for microphone access. Use SSH port forwarding for remote access over HTTP.

Model Requirements

Default Models

Speech-to-Text
Text-to-Speech
Language Model

Model: Kyutai STT 1B (English/French)Specifications:

Size: ~2GB download
VRAM: 2.5GB
Languages: English, French
Architecture: Transformer (16 layers, 2048 d_model)
Latency: 6-token delay (~200ms)

Configuration (stt.toml):

lm_model_file = "hf://kyutai/stt-1b-en_fr-candle/model.safetensors"
text_tokenizer_file = "hf://kyutai/stt-1b-en_fr-candle/tokenizer_en_fr_audio_8000.model"
audio_tokenizer_file = "hf://kyutai/stt-1b-en_fr-candle/[email protected]"
batch_size = 1
temperature = 0.25

Model: Kyutai TTS 1.6B (English/French)Specifications:

Size: ~3GB download
VRAM: 5.3GB
Languages: English, French
Voices: 100+ from community donations
Latency: ~450ms (multi-GPU), ~750ms (single GPU)

Configuration (tts.toml):

text_tokenizer_file = "hf://kyutai/tts-1.6b-en_fr/tokenizer_spm_8k_en_fr_audio.model"
voice_folder = "hf-snapshot://kyutai/tts-voices/**/*.safetensors"
batch_size = 2
cfg_coef = 2.0
n_q = 24

Default: Llama 3.2 1B InstructSpecifications:

Size: ~2.5GB download
VRAM: 6.1GB (with 16GB config)
Context: 1536 tokens (configurable)
Format: bfloat16

Alternatives:

Model	VRAM	Quality	Context
Llama 3.2 1B	6GB	Good	1536
Gemma 3 12B	20GB	Better	4096
Mistral Small 24B	30GB	Best	8192

Configuration (docker-compose.yml):

llm:
  command:
    [
      "--model=meta-llama/Llama-3.2-1B-Instruct",
      "--max-model-len=1536",
      "--dtype=bfloat16",
      "--gpu-memory-utilization=0.4",
    ]

Performance Targets

Latency

Single GPU (L40S):

STT: ~200ms
LLM: ~500ms (model dependent)
TTS: ~750ms
Total: ~1450ms

Multi-GPU:

STT: ~200ms
LLM: ~500ms
TTS: ~450ms
Total: ~1150ms

Throughput

Per Backend Instance:

Max concurrent users: 4
Limited by Python GIL

Scaling Strategy:

Run multiple backend replicas
Each replica handles 4 users
Load balance with Traefik
Example: 10 replicas = 40 users

Optional Components

NewsAPI (for News Character)

The “Dev (news)” character requires a NewsAPI key:

Sign up at newsapi.org
Get your free API key
Add to environment:
```
export NEWSAPI_API_KEY=your_key_here
```

Without this, the news character won’t have current topics.

Redis (for Service Discovery)

Optional for Docker Swarm deployments:

Used for service registration and health checks
Required for multi-node setups
Not needed for single-machine Docker Compose

Prometheus + Grafana (Monitoring)

Optional monitoring stack:

Prometheus: Metrics collection
Grafana: Visualization dashboards
Pre-configured for Unmute metrics
Included in Docker Swarm setup

See services/prometheus/ and services/grafana/

Deployment Comparison

Choose the deployment method that matches your resources and use case:

Method	GPUs	Machines	Difficulty	Kyutai Support	Best For
Docker Compose	1+	1	Very Easy	✅ Full	Development, testing, single-user
Dockerless	1-3	1-5	Easy	✅ Full	Custom setups, debugging
Docker Swarm	1-100	1-100	Medium	❌ None	Production, scaling, unmute.sh

Start with Docker Compose

We strongly recommend starting with Docker Compose:

Fastest setup (5-10 minutes)
Fully supported by Kyutai team
Easy to troubleshoot
Perfect for learning and development

Switch to other methods only when you need:

Fine-grained control (Dockerless)
Multi-machine scaling (Docker Swarm)

Ready to Start?

Quick Start Guide

Follow step-by-step instructions to get Unmute running

Join the Community

Star the repo, report issues, and contribute

Get Started

Deployment

Configuration

Hardware Requirements

GPU Requirements

Architecture Requirements

x86_64 Only

System Memory

Software Requirements

Operating System

Docker (Docker Compose)

Dockerless Setup (Alternative)

Configuration Requirements

Hugging Face Access

Model Access Token

Network Requirements

Ports

Bandwidth

Browser Requirements

WebRTC & WebSocket Support

Model Requirements

Default Models

Performance Targets

Latency

Throughput

Optional Components

Deployment Comparison

Start with Docker Compose

Ready to Start?

Quick Start Guide

Join the Community

Build docs developers (and LLMs) love

Get Started

Deployment

Configuration

​Hardware Requirements

​GPU Requirements

​Architecture Requirements

x86_64 Only

​System Memory

​Software Requirements

​Operating System

​Docker (Docker Compose)

​Dockerless Setup (Alternative)

​Configuration Requirements

​Hugging Face Access

Model Access Token

​Network Requirements

Ports

Bandwidth

​Browser Requirements

WebRTC & WebSocket Support

​Model Requirements

​Default Models

​Performance Targets

Latency

Throughput

​Optional Components

​Deployment Comparison

Start with Docker Compose

​Ready to Start?

Quick Start Guide

Join the Community

Build docs developers (and LLMs) love

Hardware Requirements

GPU Requirements

Architecture Requirements

System Memory

Software Requirements

Operating System

Docker (Docker Compose)

Dockerless Setup (Alternative)

Configuration Requirements

Hugging Face Access

Network Requirements

Browser Requirements

Model Requirements

Default Models

Performance Targets

Optional Components

Deployment Comparison

Ready to Start?