Docker Swarm Deployment

Docker Swarm is used for production deployments like unmute.sh. While Docker Compose runs on a single machine, Docker Swarm scales across multiple nodes. Think of it as “multi-node Docker Compose.”

This deployment method is provided to show how Unmute.sh scales in production. Due to the complexity of multi-node debugging, the Kyutai team cannot provide support for Swarm deployments. Use at your own risk.

When to Use Docker Swarm

Choose Docker Swarm when you need:

High availability: Multiple replicas ensure uptime during updates or crashes
Horizontal scaling: Distribute load across multiple GPUs/machines
Production features: HTTPS, monitoring, authentication, load balancing
Traffic handling: Support for many concurrent users

Feature	Docker Compose	Docker Swarm
Machines	1	1 to ~100
GPUs	1+	1 to ~100
Difficulty	Very easy	Medium
HTTPS	No	Yes
Monitoring	No	Yes (Prometheus, Grafana)
Load Balancing	No	Yes
High Availability	No	Yes

Architecture

Main Application

Monitoring Stack

Setup Instructions

All commands should be executed from a client machine with access to your swarm nodes, not directly on the swarm nodes themselves.

Prepare GPU Nodes

Set up each GPU node in your swarm:

# Copy setup script to the node
scp setup_gpu_swarm_node.py llm-wrapper-gpu000:/root/

# Run setup on the node
ssh llm-wrapper-gpu000 python3 /root/setup_gpu_swarm_node.py

This script installs Docker, NVIDIA drivers, and configures the node for Swarm.

Initialize Swarm Manager

Designate one node as the manager (only needed once):

docker -H ssh://llm-wrapper-gpu000 swarm init

The manager node coordinates the swarm and must run certain services like Prometheus.

Add Worker Nodes

Get the join command from the manager:

docker -H ssh://llm-wrapper-gpu000 swarm join-token worker

This outputs a command like:

docker swarm join --token SWMTKN-1-xxxxx... llm-wrapper-gpu000:2377

Run this command on each worker node to join the swarm.

Configure Environment Variables

Set up required environment variables on your client machine:

# Required: Hugging Face token for model access
export HUGGING_FACE_HUB_TOKEN=hf_...

# Required: Google OAuth for monitoring access
export PROVIDERS_GOOGLE_CLIENT_SECRET=...

# Optional: For the "Dev (news)" character
export NEWSAPI_API_KEY=...

How to generate tokens:

HUGGING_FACE_HUB_TOKEN: Create a token with access to gemma-3-12b-it
PROVIDERS_GOOGLE_CLIENT_SECRET: Set up Google OAuth for monitoring authentication
NEWSAPI_API_KEY: Get an API key (optional)

Deploy the Stack

Run the deployment script:

# For production
./bake_deploy_prod.sh

# For staging
./bake_deploy_staging.sh

These scripts build images and deploy the stack defined in swarm-deploy.yml.

Production URLs

Once deployed, services are available at:

Main app: https://unmute.sh
Traefik dashboard: https://traefik.unmute.sh
Grafana: https://grafana.unmute.sh
Prometheus: https://prometheus.unmute.sh
Portainer: https://portainer.unmute.sh

Monitoring services require Google authentication (configured via traefik-forward-auth).

Key Configuration Details

HTTPS with Let’s Encrypt

Traefik automatically obtains and renews SSL certificates:

swarm-deploy.yml

traefik:
  command:
    - "--entrypoints.web.address=:80"
    - "--entrypoints.websecure.address=:443"
    # Redirect all HTTP to HTTPS
    - "--entrypoints.web.http.redirections.entryPoint.to=websecure"
    - "--entrypoints.web.http.redirections.entryPoint.scheme=https"
    # Let's Encrypt configuration
    - "--certificatesResolvers.letsencrypt_resolver.acme.httpChallenge.entryPoint=web"
    - "--certificatesResolvers.letsencrypt_resolver.acme.storage=/letsencrypt/acme.json"
    - "--certificatesResolvers.letsencrypt_resolver.acme.email=your-email@example.com"

GPU Resource Allocation

Services reserve GPUs using generic resources:

swarm-deploy.yml

tts:
  deploy:
    replicas: 3
    resources:
      reservations:
        generic_resources:
          - discrete_resource_spec:
              kind: gpu
              value: 1  # Reserve 1 GPU per replica

With 3 TTS replicas, this requires 3 GPUs total.

Service Replicas

Swarm configuration on unmute.sh:

Frontend: 5 replicas (no GPU needed)
Backend: 16 replicas (no GPU needed)
LLM: 2 replicas (2 GPUs total)
TTS: 3 replicas (3 GPUs total)
STT: 1 replica (1 GPU)
Voice cloning: 2 replicas (no GPU)

Load Balancing

The backend uses manual load balancing for TTS/STT via tasks.<service_name>:

swarm-deploy.yml

backend:
  environment:
    # Returns all replica IPs for manual load balancing
    - KYUTAI_STT_URL=ws://tasks.stt:8080
    - KYUTAI_TTS_URL=ws://tasks.tts:8080
    # Single endpoint (Swarm handles load balancing)
    - KYUTAI_LLM_URL=http://llm:8000

Scaling Operations

Adding More Resources

Add New Node to Swarm

# On manager, get join token
docker -H ssh://llm-wrapper-gpu000 swarm join-token worker

# Run the output command on the new node

Scale Service

docker -H ssh://llm-wrapper-gpu000 service scale llm-wrapper_llm=10

This increases the LLM service to 10 replicas.

Swarm does not automatically rebalance containers across nodes. To redistribute containers after adding nodes, force a service restart.

Restarting a Service

Force update to restart all replicas:

docker -H ssh://llm-wrapper-gpu000 service update --force llm-wrapper_tts

Useful when:

New voices are added to voices.yaml
Configuration changes need to propagate
Rebalancing containers across nodes

Updating a Single Service

Update just the frontend image without touching other services:

docker -H ssh://llm-wrapper-gpu000 service update \
  --image your-registry/unmute-frontend:latest \
  --with-registry-auth \
  llm-wrapper_frontend

Other useful updates:

# Add a volume
docker service update \
  --mount-add type=volume,source=new-volume,target=/data \
  llm-wrapper_frontend

# Change environment variable
docker service update \
  --env-add KYUTAI_LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3 \
  llm-wrapper_backend

Monitoring

The swarm deployment includes a complete monitoring stack:

Prometheus

Metrics collection configured via Docker socket:

swarm-deploy.yml

prometheus:
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro
  user: root
  deploy:
    placement:
      constraints:
        - node.role == manager  # Must run on manager

Services expose metrics via the prometheus-port label:

swarm-deploy.yml

backend:
  deploy:
    labels:
      - "prometheus-port=80"

Grafana

Dashboards are built into the Docker image:

swarm-deploy.yml

grafana:
  build:
    context: services/grafana

Changes to dashboards in the UI are lost on restart unless exported and added to the build context.

Cadvisor

Collects container metrics on every node:

swarm-deploy.yml

cadvisor:
  deploy:
    mode: global  # One container per node

Advanced Configuration

Changing Docker Data Directory

If you need more disk space, change Docker’s data location:

Edit /etc/docker/daemon.json on each node:
```
{
  "data-root": "/new/docker-data"
}
```
Restart Docker:
```
service docker restart
```

Network Encryption

The overlay network is encrypted for security:

swarm-deploy.yml

networks:
  default:
    driver: overlay
    attachable: true
    driver_opts:
      encrypted: "true"

This is important when nodes communicate over the public internet.

Redis for State Management

Unlike single-machine deployments, Swarm uses Redis for shared state:

swarm-deploy.yml

redis:
  image: redis:latest

backend:
  environment:
    - KYUTAI_REDIS_URL=redis://redis:6379

Troubleshooting

Service Won’t Start

Check service logs:

docker -H ssh://llm-wrapper-gpu000 service logs llm-wrapper_tts

Check service status:

docker -H ssh://llm-wrapper-gpu000 service ps llm-wrapper_tts

Not Enough Resources

If services can’t find GPUs or have insufficient CPU/memory:

# List available nodes and resources
docker -H ssh://llm-wrapper-gpu000 node ls

# Inspect node details
docker -H ssh://llm-wrapper-gpu000 node inspect llm-wrapper-gpu001

Adjust resource reservations in swarm-deploy.yml.

Network Issues

Use the debugger service to test connectivity:

docker -H ssh://llm-wrapper-gpu000 exec \
  $(docker service ps llm-wrapper_debugger -q) \
  ping tts

Differences from Docker Compose

Feature	Docker Compose	Docker Swarm
Configuration syntax	Very similar	Nearly identical
Service placement	Same machine	Across nodes
Scaling	Manual container creation	`docker service scale`
Updates	`docker compose up`	`docker service update`
Load balancing	None	Automatic
Secrets management	.env files	Docker secrets
Health checks	Limited	Full support

Migration Path

To migrate from Docker Compose to Swarm:

Start with one node: Initialize swarm on your existing machine
Convert compose file: Most syntax is compatible, add deploy sections
Test locally: Deploy to single-node swarm
Add monitoring: Set up Prometheus, Grafana, Traefik
Scale out: Add worker nodes and increase replicas

Next Steps

Review the swarm-deploy.yml file for complete configuration
Learn about Docker Swarm documentation
Understand Traefik with Swarm
Explore simpler options: Docker Compose or Dockerless

Get Started

Deployment

Configuration

Docker Swarm Deployment

When to Use Docker Swarm

Architecture

Main Application

Monitoring Stack

Setup Instructions

Production URLs

Key Configuration Details

HTTPS with Let’s Encrypt

GPU Resource Allocation

Service Replicas

Load Balancing

Scaling Operations

Adding More Resources

Restarting a Service

Updating a Single Service

Monitoring

Prometheus

Grafana

Cadvisor

Advanced Configuration

Changing Docker Data Directory

Network Encryption

Redis for State Management

Troubleshooting

Service Won’t Start

Not Enough Resources

Network Issues

Differences from Docker Compose

Migration Path

Next Steps

Build docs developers (and LLMs) love

Get Started

Deployment

Configuration

​When to Use Docker Swarm

​Architecture

​Main Application

​Monitoring Stack

​Setup Instructions

​Production URLs

​Key Configuration Details

​HTTPS with Let’s Encrypt

​GPU Resource Allocation

​Service Replicas

​Load Balancing

​Scaling Operations

​Adding More Resources

​Restarting a Service

​Updating a Single Service

​Monitoring

​Prometheus

​Grafana

​Cadvisor

​Advanced Configuration

​Changing Docker Data Directory

​Network Encryption

​Redis for State Management

​Troubleshooting

​Service Won’t Start

​Not Enough Resources

​Network Issues

​Differences from Docker Compose

​Migration Path

​Next Steps

Build docs developers (and LLMs) love

When to Use Docker Swarm

Architecture

Main Application

Monitoring Stack

Setup Instructions

Production URLs

Key Configuration Details

HTTPS with Let’s Encrypt

GPU Resource Allocation

Service Replicas

Load Balancing

Scaling Operations

Adding More Resources

Restarting a Service

Updating a Single Service

Monitoring

Prometheus

Grafana

Cadvisor

Advanced Configuration

Changing Docker Data Directory

Network Encryption

Redis for State Management

Troubleshooting

Service Won’t Start

Not Enough Resources

Network Issues

Differences from Docker Compose

Migration Path

Next Steps