Skip to main content
Docker Swarm is used for production deployments like unmute.sh. While Docker Compose runs on a single machine, Docker Swarm scales across multiple nodes. Think of it as “multi-node Docker Compose.”
This deployment method is provided to show how Unmute.sh scales in production. Due to the complexity of multi-node debugging, the Kyutai team cannot provide support for Swarm deployments. Use at your own risk.

When to Use Docker Swarm

Choose Docker Swarm when you need:
  • High availability: Multiple replicas ensure uptime during updates or crashes
  • Horizontal scaling: Distribute load across multiple GPUs/machines
  • Production features: HTTPS, monitoring, authentication, load balancing
  • Traffic handling: Support for many concurrent users
FeatureDocker ComposeDocker Swarm
Machines11 to ~100
GPUs1+1 to ~100
DifficultyVery easyMedium
HTTPSNoYes
MonitoringNoYes (Prometheus, Grafana)
Load BalancingNoYes
High AvailabilityNoYes

Architecture

Main Application

Monitoring Stack

Setup Instructions

All commands should be executed from a client machine with access to your swarm nodes, not directly on the swarm nodes themselves.
1

Prepare GPU Nodes

Set up each GPU node in your swarm:
# Copy setup script to the node
scp setup_gpu_swarm_node.py llm-wrapper-gpu000:/root/

# Run setup on the node
ssh llm-wrapper-gpu000 python3 /root/setup_gpu_swarm_node.py
This script installs Docker, NVIDIA drivers, and configures the node for Swarm.
2

Initialize Swarm Manager

Designate one node as the manager (only needed once):
docker -H ssh://llm-wrapper-gpu000 swarm init
The manager node coordinates the swarm and must run certain services like Prometheus.
3

Add Worker Nodes

Get the join command from the manager:
docker -H ssh://llm-wrapper-gpu000 swarm join-token worker
This outputs a command like:
docker swarm join --token SWMTKN-1-xxxxx... llm-wrapper-gpu000:2377
Run this command on each worker node to join the swarm.
4

Configure Environment Variables

Set up required environment variables on your client machine:
# Required: Hugging Face token for model access
export HUGGING_FACE_HUB_TOKEN=hf_...

# Required: Google OAuth for monitoring access
export PROVIDERS_GOOGLE_CLIENT_SECRET=...

# Optional: For the "Dev (news)" character
export NEWSAPI_API_KEY=...
How to generate tokens:
5

Deploy the Stack

Run the deployment script:
# For production
./bake_deploy_prod.sh

# For staging
./bake_deploy_staging.sh
These scripts build images and deploy the stack defined in swarm-deploy.yml.

Production URLs

Once deployed, services are available at: Monitoring services require Google authentication (configured via traefik-forward-auth).

Key Configuration Details

HTTPS with Let’s Encrypt

Traefik automatically obtains and renews SSL certificates:
swarm-deploy.yml
traefik:
  command:
    - "--entrypoints.web.address=:80"
    - "--entrypoints.websecure.address=:443"
    # Redirect all HTTP to HTTPS
    - "--entrypoints.web.http.redirections.entryPoint.to=websecure"
    - "--entrypoints.web.http.redirections.entryPoint.scheme=https"
    # Let's Encrypt configuration
    - "--certificatesResolvers.letsencrypt_resolver.acme.httpChallenge.entryPoint=web"
    - "--certificatesResolvers.letsencrypt_resolver.acme.storage=/letsencrypt/acme.json"
    - "--certificatesResolvers.letsencrypt_resolver.acme.email=your-email@example.com"

GPU Resource Allocation

Services reserve GPUs using generic resources:
swarm-deploy.yml
tts:
  deploy:
    replicas: 3
    resources:
      reservations:
        generic_resources:
          - discrete_resource_spec:
              kind: gpu
              value: 1  # Reserve 1 GPU per replica
With 3 TTS replicas, this requires 3 GPUs total.

Service Replicas

Swarm configuration on unmute.sh:
  • Frontend: 5 replicas (no GPU needed)
  • Backend: 16 replicas (no GPU needed)
  • LLM: 2 replicas (2 GPUs total)
  • TTS: 3 replicas (3 GPUs total)
  • STT: 1 replica (1 GPU)
  • Voice cloning: 2 replicas (no GPU)

Load Balancing

The backend uses manual load balancing for TTS/STT via tasks.<service_name>:
swarm-deploy.yml
backend:
  environment:
    # Returns all replica IPs for manual load balancing
    - KYUTAI_STT_URL=ws://tasks.stt:8080
    - KYUTAI_TTS_URL=ws://tasks.tts:8080
    # Single endpoint (Swarm handles load balancing)
    - KYUTAI_LLM_URL=http://llm:8000

Scaling Operations

Adding More Resources

1

Add New Node to Swarm

# On manager, get join token
docker -H ssh://llm-wrapper-gpu000 swarm join-token worker

# Run the output command on the new node
2

Scale Service

docker -H ssh://llm-wrapper-gpu000 service scale llm-wrapper_llm=10
This increases the LLM service to 10 replicas.
Swarm does not automatically rebalance containers across nodes. To redistribute containers after adding nodes, force a service restart.

Restarting a Service

Force update to restart all replicas:
docker -H ssh://llm-wrapper-gpu000 service update --force llm-wrapper_tts
Useful when:
  • New voices are added to voices.yaml
  • Configuration changes need to propagate
  • Rebalancing containers across nodes

Updating a Single Service

Update just the frontend image without touching other services:
docker -H ssh://llm-wrapper-gpu000 service update \
  --image your-registry/unmute-frontend:latest \
  --with-registry-auth \
  llm-wrapper_frontend
Other useful updates:
# Add a volume
docker service update \
  --mount-add type=volume,source=new-volume,target=/data \
  llm-wrapper_frontend

# Change environment variable
docker service update \
  --env-add KYUTAI_LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3 \
  llm-wrapper_backend

Monitoring

The swarm deployment includes a complete monitoring stack:

Prometheus

Metrics collection configured via Docker socket:
swarm-deploy.yml
prometheus:
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro
  user: root
  deploy:
    placement:
      constraints:
        - node.role == manager  # Must run on manager
Services expose metrics via the prometheus-port label:
swarm-deploy.yml
backend:
  deploy:
    labels:
      - "prometheus-port=80"

Grafana

Dashboards are built into the Docker image:
swarm-deploy.yml
grafana:
  build:
    context: services/grafana
Changes to dashboards in the UI are lost on restart unless exported and added to the build context.

Cadvisor

Collects container metrics on every node:
swarm-deploy.yml
cadvisor:
  deploy:
    mode: global  # One container per node

Advanced Configuration

Changing Docker Data Directory

If you need more disk space, change Docker’s data location:
  1. Edit /etc/docker/daemon.json on each node:
    {
      "data-root": "/new/docker-data"
    }
    
  2. Restart Docker:
    service docker restart
    

Network Encryption

The overlay network is encrypted for security:
swarm-deploy.yml
networks:
  default:
    driver: overlay
    attachable: true
    driver_opts:
      encrypted: "true"
This is important when nodes communicate over the public internet.

Redis for State Management

Unlike single-machine deployments, Swarm uses Redis for shared state:
swarm-deploy.yml
redis:
  image: redis:latest

backend:
  environment:
    - KYUTAI_REDIS_URL=redis://redis:6379

Troubleshooting

Service Won’t Start

Check service logs:
docker -H ssh://llm-wrapper-gpu000 service logs llm-wrapper_tts
Check service status:
docker -H ssh://llm-wrapper-gpu000 service ps llm-wrapper_tts

Not Enough Resources

If services can’t find GPUs or have insufficient CPU/memory:
# List available nodes and resources
docker -H ssh://llm-wrapper-gpu000 node ls

# Inspect node details
docker -H ssh://llm-wrapper-gpu000 node inspect llm-wrapper-gpu001
Adjust resource reservations in swarm-deploy.yml.

Network Issues

Use the debugger service to test connectivity:
docker -H ssh://llm-wrapper-gpu000 exec \
  $(docker service ps llm-wrapper_debugger -q) \
  ping tts

Differences from Docker Compose

FeatureDocker ComposeDocker Swarm
Configuration syntaxVery similarNearly identical
Service placementSame machineAcross nodes
ScalingManual container creationdocker service scale
Updatesdocker compose updocker service update
Load balancingNoneAutomatic
Secrets management.env filesDocker secrets
Health checksLimitedFull support

Migration Path

To migrate from Docker Compose to Swarm:
  1. Start with one node: Initialize swarm on your existing machine
  2. Convert compose file: Most syntax is compatible, add deploy sections
  3. Test locally: Deploy to single-node swarm
  4. Add monitoring: Set up Prometheus, Grafana, Traefik
  5. Scale out: Add worker nodes and increase replicas

Next Steps

Build docs developers (and LLMs) love