Dockerless Deployment

The Dockerless deployment allows you to run Unmute by manually starting each service without Docker. This approach is useful for development, debugging, or when you need more control over the environment.

This is more difficult to set up than Docker Compose due to various dependencies. Consider using Docker Compose unless you specifically need a Dockerless setup.

Requirements

Hardware

GPU: CUDA-compatible GPU with at least 16 GB VRAM
Architecture: x86_64
OS: Linux or Windows with WSL

VRAM Usage by Service

LLM: 6.1 GB
TTS: 5.3 GB
STT: 2.5 GB
Total: ~14 GB minimum

Software Dependencies

Install uv (Python package manager)

curl -LsSf https://astral.sh/uv/install.sh | sh

Install cargo (Rust toolchain)

curl https://sh.rustup.rs -sSf | sh

Install pnpm (Node.js package manager)

curl -fsSL https://get.pnpm.io/install.sh | sh -

Install CUDA 12.1

Install CUDA 12.1 via conda or from the NVIDIA website.This is required for the Rust processes (TTS and STT).

Starting Services

Each service must be started in a separate terminal session or tmux window. The repository includes helper scripts in the dockerless/ directory.

Service Startup Order

While services can be started in any order, here’s the recommended sequence:

Start Frontend

./dockerless/start_frontend.sh

This script:

Installs Node.js dependencies with pnpm install
Ensures the correct Node.js LTS version
Starts the Next.js development server on port 3000

Script contents:

start_frontend.sh

#!/bin/bash
set -ex
cd "$(dirname "$0")/.."

cd frontend
pnpm install
pnpm env use --global lts
pnpm dev

Start LLM Server

./dockerless/start_llm.sh

Launches VLLM with these settings:

Model: google/gemma-3-1b-it
Port: 8091
Max model length: 8192 tokens
GPU memory utilization: 30%
VRAM usage: ~6.1 GB

Script contents:

start_llm.sh

#!/bin/bash
set -ex
cd "$(dirname "$0")/.."

uv tool run [email protected] serve \
  --model=google/gemma-3-1b-it \
  --max-model-len=8192 \
  --dtype=bfloat16 \
  --gpu-memory-utilization=0.3 \
  --port=8091

Start STT Server

./dockerless/start_stt.sh

Compiles and runs the speech-to-text Rust server:

Port: 8090
VRAM usage: ~2.5 GB

Key steps:

Creates Python virtual environment for libpython dependency
Sets LD_LIBRARY_PATH for Python library linking
Installs moshi-server with CUDA features
Runs with STT-specific config

The first run will take several minutes to compile the Rust binary.

Start TTS Server

./dockerless/start_tts.sh

Compiles and runs the text-to-speech Rust server:

Port: 8089
VRAM usage: ~5.3 GB

Important environment setup:

export LD_LIBRARY_PATH=$(python -c 'import sysconfig; print(sysconfig.get_config_var("LIBDIR"))')

This must be set before running cargo install to ensure the Rust binary can find Python libraries.

If you see errors like no module named 'huggingface_hub', the LD_LIBRARY_PATH wasn’t set correctly before compilation. Recompile with cargo install --force.

Start Backend

./dockerless/start_backend.sh

Starts the FastAPI backend server:

Port: 8000
WebSocket per-message deflate disabled for better performance
Auto-reload enabled for development

Script contents:

start_backend.sh

#!/bin/bash
set -ex
cd "$(dirname "$0")/.."

uv run uvicorn unmute.main_websocket:app --reload --host 0.0.0.0 --port 8000 --ws-per-message-deflate=false

Accessing Unmute

Once all services are running, access the web interface at:

http://localhost:3000

Environment Variables

The backend expects these default service URLs. If you’ve changed the ports, set these environment variables before starting the backend:

export KYUTAI_STT_URL=ws://localhost:8090
export KYUTAI_TTS_URL=ws://localhost:8089
export KYUTAI_LLM_URL=http://localhost:8091
export HUGGING_FACE_HUB_TOKEN=hf_...

Using tmux for Session Management

To manage multiple services easily, use tmux:

# Create a new tmux session
tmux new -s unmute

# Start first service
./dockerless/start_frontend.sh

# Create new pane (Ctrl+B then ")
# Start second service
./dockerless/start_llm.sh

# Continue creating panes and starting services
# Ctrl+B then arrow keys to navigate between panes
# Ctrl+B then d to detach from session
# tmux attach -t unmute to reattach

Customizing Services

Change LLM Model

Edit dockerless/start_llm.sh:

uv tool run [email protected] serve \
  --model=mistralai/Mistral-7B-Instruct-v0.3 \
  --max-model-len=4096 \
  --dtype=bfloat16 \
  --gpu-memory-utilization=0.5 \
  --port=8091

Adjust GPU Memory Usage

Modify --gpu-memory-utilization in the LLM script:

Lower values: Less memory, supports longer conversations
Higher values: Faster inference, more memory required

Use Multiple GPUs

Set CUDA_VISIBLE_DEVICES before starting each service:

# Terminal 1 - STT on GPU 0
CUDA_VISIBLE_DEVICES=0 ./dockerless/start_stt.sh

# Terminal 2 - TTS on GPU 1
CUDA_VISIBLE_DEVICES=1 ./dockerless/start_tts.sh

# Terminal 3 - LLM on GPU 2
CUDA_VISIBLE_DEVICES=2 ./dockerless/start_llm.sh

Troubleshooting

Compilation Errors

Issue: Sentencepiece build fails Solution: Set the CXXFLAGS environment variable:

export CXXFLAGS="-include cstdint"

This is a fix for building Sentencepiece on GCC 15.

Missing Python Dependencies

Issue: no module named 'huggingface_hub' when running TTS/STT Solution:

Activate the virtual environment: source .venv/bin/activate
Set LD_LIBRARY_PATH before running cargo install
Force rebuild: cargo install --force --features cuda [email protected]

Wrong moshi-server Binary

Issue: moshi-server: error: unrecognized arguments: worker Solution: You’re using the Python package binary instead of the Rust package. Update the Python package:

uv pip install moshi --upgrade  # Must be >=0.2.8

Port Already in Use

Change the port in the respective start script and update the environment variables in the backend start script.

Stopping Services

To stop a service:

Switch to its terminal/tmux pane
Press Ctrl+C

To stop all services at once with tmux:

tmux kill-session -t unmute

Development Workflow

The Dockerless setup is ideal for development because:

Frontend hot-reloading: Changes to frontend code reload automatically
Backend auto-reload: The --reload flag restarts the backend on code changes
Easy debugging: Direct access to logs in each terminal
Fast iteration: No Docker image rebuilding needed

Next Steps

Set up remote access to connect from another machine
Learn about Docker Compose for easier deployment
Explore production deployment with Docker Swarm

Get Started

Deployment

Configuration

Dockerless Deployment

Requirements

Hardware

VRAM Usage by Service

Software Dependencies

Starting Services

Service Startup Order

Accessing Unmute

Environment Variables

Using tmux for Session Management

Customizing Services

Change LLM Model

Adjust GPU Memory Usage

Use Multiple GPUs

Troubleshooting

Compilation Errors

Missing Python Dependencies

Wrong moshi-server Binary

Port Already in Use

Stopping Services

Development Workflow

Next Steps

Build docs developers (and LLMs) love

Get Started

Deployment

Configuration

​Requirements

​Hardware

​VRAM Usage by Service

​Software Dependencies

​Starting Services

​Service Startup Order

​Accessing Unmute

​Environment Variables

​Using tmux for Session Management

​Customizing Services

​Change LLM Model

​Adjust GPU Memory Usage

​Use Multiple GPUs

​Troubleshooting

​Compilation Errors

​Missing Python Dependencies

​Wrong moshi-server Binary

​Port Already in Use

​Stopping Services

​Development Workflow

​Next Steps

Build docs developers (and LLMs) love

Requirements

Hardware

VRAM Usage by Service

Software Dependencies

Starting Services

Service Startup Order

Accessing Unmute

Environment Variables

Using tmux for Session Management

Customizing Services

Change LLM Model

Adjust GPU Memory Usage

Use Multiple GPUs

Troubleshooting

Compilation Errors

Missing Python Dependencies

Wrong moshi-server Binary

Port Already in Use

Stopping Services

Development Workflow

Next Steps