When to Use Docker Swarm
Choose Docker Swarm when you need:- High availability: Multiple replicas ensure uptime during updates or crashes
- Horizontal scaling: Distribute load across multiple GPUs/machines
- Production features: HTTPS, monitoring, authentication, load balancing
- Traffic handling: Support for many concurrent users
| Feature | Docker Compose | Docker Swarm |
|---|---|---|
| Machines | 1 | 1 to ~100 |
| GPUs | 1+ | 1 to ~100 |
| Difficulty | Very easy | Medium |
| HTTPS | No | Yes |
| Monitoring | No | Yes (Prometheus, Grafana) |
| Load Balancing | No | Yes |
| High Availability | No | Yes |
Architecture
Main Application
Monitoring Stack
Setup Instructions
Prepare GPU Nodes
Set up each GPU node in your swarm:This script installs Docker, NVIDIA drivers, and configures the node for Swarm.
Initialize Swarm Manager
Designate one node as the manager (only needed once):The manager node coordinates the swarm and must run certain services like Prometheus.
Add Worker Nodes
Get the join command from the manager:This outputs a command like:Run this command on each worker node to join the swarm.
Configure Environment Variables
Set up required environment variables on your client machine:How to generate tokens:
- HUGGING_FACE_HUB_TOKEN: Create a token with access to gemma-3-12b-it
- PROVIDERS_GOOGLE_CLIENT_SECRET: Set up Google OAuth for monitoring authentication
- NEWSAPI_API_KEY: Get an API key (optional)
Production URLs
Once deployed, services are available at:- Main app: https://unmute.sh
- Traefik dashboard: https://traefik.unmute.sh
- Grafana: https://grafana.unmute.sh
- Prometheus: https://prometheus.unmute.sh
- Portainer: https://portainer.unmute.sh
traefik-forward-auth).
Key Configuration Details
HTTPS with Let’s Encrypt
Traefik automatically obtains and renews SSL certificates:swarm-deploy.yml
GPU Resource Allocation
Services reserve GPUs using generic resources:swarm-deploy.yml
Service Replicas
Swarm configuration on unmute.sh:- Frontend: 5 replicas (no GPU needed)
- Backend: 16 replicas (no GPU needed)
- LLM: 2 replicas (2 GPUs total)
- TTS: 3 replicas (3 GPUs total)
- STT: 1 replica (1 GPU)
- Voice cloning: 2 replicas (no GPU)
Load Balancing
The backend uses manual load balancing for TTS/STT viatasks.<service_name>:
swarm-deploy.yml
Scaling Operations
Adding More Resources
Swarm does not automatically rebalance containers across nodes. To redistribute containers after adding nodes, force a service restart.
Restarting a Service
Force update to restart all replicas:- New voices are added to
voices.yaml - Configuration changes need to propagate
- Rebalancing containers across nodes
Updating a Single Service
Update just the frontend image without touching other services:Monitoring
The swarm deployment includes a complete monitoring stack:Prometheus
Metrics collection configured via Docker socket:swarm-deploy.yml
prometheus-port label:
swarm-deploy.yml
Grafana
Dashboards are built into the Docker image:swarm-deploy.yml
Cadvisor
Collects container metrics on every node:swarm-deploy.yml
Advanced Configuration
Changing Docker Data Directory
If you need more disk space, change Docker’s data location:-
Edit
/etc/docker/daemon.jsonon each node: -
Restart Docker:
Network Encryption
The overlay network is encrypted for security:swarm-deploy.yml
Redis for State Management
Unlike single-machine deployments, Swarm uses Redis for shared state:swarm-deploy.yml
Troubleshooting
Service Won’t Start
Check service logs:Not Enough Resources
If services can’t find GPUs or have insufficient CPU/memory:swarm-deploy.yml.
Network Issues
Use the debugger service to test connectivity:Differences from Docker Compose
| Feature | Docker Compose | Docker Swarm |
|---|---|---|
| Configuration syntax | Very similar | Nearly identical |
| Service placement | Same machine | Across nodes |
| Scaling | Manual container creation | docker service scale |
| Updates | docker compose up | docker service update |
| Load balancing | None | Automatic |
| Secrets management | .env files | Docker secrets |
| Health checks | Limited | Full support |
Migration Path
To migrate from Docker Compose to Swarm:- Start with one node: Initialize swarm on your existing machine
- Convert compose file: Most syntax is compatible, add
deploysections - Test locally: Deploy to single-node swarm
- Add monitoring: Set up Prometheus, Grafana, Traefik
- Scale out: Add worker nodes and increase replicas
Next Steps
- Review the swarm-deploy.yml file for complete configuration
- Learn about Docker Swarm documentation
- Understand Traefik with Swarm
- Explore simpler options: Docker Compose or Dockerless