Quick Start

Prerequisites

Before you begin, ensure you have:

Hardware Requirements

GPU: CUDA-capable GPU with 16GB+ VRAM
Architecture: x86_64 only (no aarch64 support)
Single GPU is sufficient for basic setup
Multi-GPU setup recommended for production (see below)

Operating System

Linux: Any modern distribution
Windows: WSL 2 (installation guide)
macOS: Not supported (issue #74)
Windows Native: Not supported (issue #84)

Software Requirements

Docker Compose (install guide)
NVIDIA Container Toolkit (install guide)

Step 1: Verify NVIDIA Container Toolkit

Confirm your GPU is accessible to Docker:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Expected output

You should see your GPU information displayed, including:

GPU name and driver version
Memory usage and total VRAM
CUDA version

If this fails, install the NVIDIA Container Toolkit before proceeding.

Step 2: Get Hugging Face Access

Unmute uses open-weight models from Hugging Face. You’ll need a token to download them.

Create Hugging Face Account

Accept Model License

By default, Unmute uses Llama 3.2 1B Instruct. Visit the model page and accept the license terms.

For better quality with more VRAM, use Mistral Small 3.2 24B or Gemma 3 12B

Generate Access Token

Create a token with these settings:

Type: Fine-grained
Permission: Read access to contents of all public gated repos you can access

Never use tokens with write access when deploying publicly. If compromised, attackers could modify your Hugging Face content.

Set Environment Variable

Add your token to your shell configuration:

echo 'export HUGGING_FACE_HUB_TOKEN=hf_your_token_here' >> ~/.bashrc
source ~/.bashrc

Verify it’s set:

echo $HUGGING_FACE_HUB_TOKEN

Step 3: Clone Repository

git clone https://github.com/kyutai-labs/unmute.git
cd unmute

Step 4: Configure Memory (Optional)

The default docker-compose.yml uses Llama 3.2 1B which requires 16GB VRAM. If you have memory issues, adjust these settings:

llm:
  image: vllm/vllm-openai:v0.11.0
  command:
    [
      "--model=meta-llama/Llama-3.2-1B-Instruct",
      "--max-model-len=1536",
      "--dtype=bfloat16",
      "--gpu-memory-utilization=0.4",
    ]

Step 5: Launch Unmute

Start all services with a single command:

docker compose up --build

First Run (10-15 minutes)

Docker will:

Build container images
Download models from Hugging Face (~8GB)
Initialize services

Models are cached in ./volumes/hf-cache/ so subsequent starts are much faster (30-60 seconds)

Wait for Services

Monitor the logs. Services are ready when you see:

unmute-backend-1  | INFO:     Application startup complete.
unmute-frontend-1 | ✓ Ready in 3.2s
unmute-llm-1      | INFO:     Uvicorn running on http://0.0.0.0:8000
unmute-stt-1      | Listening on 0.0.0.0:8080
unmute-tts-1      | Listening on 0.0.0.0:8080

Access Unmute

Open your browser to:http://localhost:80

If port 80 is in use, edit docker-compose.yml and change "80:80" to "3000:80" under the traefik service, then access via http://localhost:3000

Step 6: Start Talking

Grant Microphone Access

Your browser will request microphone permission. Click “Allow”.

Select a Character

Choose from the available voices and personalities:

Watercooler: Casual small talk
Quiz show: Interactive trivia
Gertrude: Life advice and sympathy
More voices available in voices.yaml

Click Connect

The system establishes WebSocket connections and initializes the conversation.

Speak Naturally

Start talking! The bot will:

Transcribe your speech in real-time
Generate contextual responses
Speak back to you with the selected voice

Keyboard Shortcuts:

Press S to toggle subtitles for both user and bot
Press D for debug mode (requires enabling ALLOW_DEV_MODE in useKeyboardShortcuts.ts)

Multi-GPU Configuration

Running STT, TTS, and LLM on separate GPUs reduces TTS latency from ~750ms to ~450ms.

If you have 3+ GPUs, edit docker-compose.yml to assign dedicated GPUs:

docker-compose.yml

stt:
  # ...existing config...
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]

tts:
  # ...existing config...
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]

llm:
  # ...existing config...
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]

By default, all services share available GPUs. This configuration ensures each service gets its own GPU.

Remote Access via SSH

Docker Compose (Port 80)

Forward port 80 from remote to local port 3333:

ssh -N -L 3333:localhost:80 unmute-box

Then access via http://localhost:3333

Browsers require localhost or HTTPS for microphone access. Direct HTTP access to http://unmute-box:80 won’t work.

Why Port Forwarding is Required

Modern browsers block microphone access over HTTP unless:

The origin is localhost or 127.0.0.1
The connection uses HTTPS

Port forwarding makes the remote server appear as localhost to your browser.

Common Issues

Out of Memory Errors

Reduce memory usage in docker-compose.yml:

llm:
  command:
    [
      "--model=meta-llama/Llama-3.2-1B-Instruct",
      "--max-model-len=1024",           # Lower from 1536
      "--gpu-memory-utilization=0.3",   # Lower from 0.4
    ]

Or increase batch sizes for TTS/STT to share memory better (higher latency).

Models Not Downloading

Check your Hugging Face token:

echo $HUGGING_FACE_HUB_TOKEN

Verify you accepted the model license on Hugging Face.

Port Already in Use

Change the port in docker-compose.yml:

traefik:
  ports:
    - "8080:80"  # Use port 8080 instead

WebSocket Connection Failed

Ensure all services are running:

docker compose ps

Check backend logs:

docker compose logs backend

Next Steps

Customize Voices

Add custom voices and modify character personalities

Use External LLMs

Connect to OpenAI, Ollama, or other LLM providers

Production Deployment

Scale Unmute with Docker Swarm for production workloads

Development Guide

Contribute to Unmute or build custom frontends

Stop Unmute

To stop all services:

docker compose down

To also remove downloaded models and caches:

docker compose down -v
rm -rf volumes/

Need help? Open an issue on GitHub - the Kyutai team actively supports Docker Compose deployments.

Get Started

Deployment

Configuration

Prerequisites

Step 1: Verify NVIDIA Container Toolkit

Step 2: Get Hugging Face Access

Step 3: Clone Repository

Step 4: Configure Memory (Optional)

Step 5: Launch Unmute

Step 6: Start Talking

Multi-GPU Configuration

Remote Access via SSH

Common Issues

Next Steps

Customize Voices

Use External LLMs

Production Deployment

Development Guide

Stop Unmute

Build docs developers (and LLMs) love

Get Started

Deployment

Configuration

​Prerequisites

​Step 1: Verify NVIDIA Container Toolkit

​Step 2: Get Hugging Face Access

​Step 3: Clone Repository

​Step 4: Configure Memory (Optional)

​Step 5: Launch Unmute

​Step 6: Start Talking

​Multi-GPU Configuration

​Remote Access via SSH

​Common Issues

​Next Steps

Customize Voices

Use External LLMs

Production Deployment

Development Guide

​Stop Unmute

Build docs developers (and LLMs) love

Prerequisites

Step 1: Verify NVIDIA Container Toolkit

Step 2: Get Hugging Face Access

Step 3: Clone Repository

Step 4: Configure Memory (Optional)

Step 5: Launch Unmute

Step 6: Start Talking

Multi-GPU Configuration

Remote Access via SSH

Common Issues

Next Steps

Stop Unmute