Docker Swarm - Dokploy

Overview

Docker Swarm is Docker’s native clustering and orchestration tool that turns a pool of Docker hosts into a single virtual host. Dokploy leverages Docker Swarm to provide multi-node deployments, service scaling, load balancing, and high availability.

Docker Swarm is included with Docker Engine and requires no additional installation.

What is Docker Swarm?

Docker Swarm provides:

Cluster Management: Manage multiple Docker hosts as a single cluster
Service Discovery: Built-in DNS-based service discovery
Load Balancing: Automatic load balancing across containers
Scaling: Horizontal scaling of services
Rolling Updates: Zero-downtime deployments
Security: TLS authentication and encrypted communication

Swarm Architecture

Manager Nodes
Worker Nodes
Services & Tasks

Manager nodes handle cluster management tasks:

Maintaining cluster state
Scheduling services
Serving Swarm API endpoints
Leader election using Raft consensus

Recommendation: Use 3 or 5 manager nodes for high availability

Initializing Docker Swarm

Dokploy automatically initializes Docker Swarm during installation. You can verify the status:

Check Swarm Status

docker info | grep Swarm
# Output: Swarm: active

Manual Initialization

If you need to initialize Swarm manually:

Initialize on Manager Node

docker swarm init --advertise-addr <MANAGER-IP>

This returns a join token for worker nodes.

Add Worker Nodes

On each worker node, run:

docker swarm join --token <TOKEN> <MANAGER-IP>:2377

Verify Cluster

docker node ls

Using the Cluster Router

Dokploy provides a Cluster API for managing Swarm:

Get Cluster Status

curl https://your-dokploy-instance.com/api/cluster.status \
  -H "Authorization: Bearer YOUR_API_KEY"

Initialize Cluster

curl -X POST https://your-dokploy-instance.com/api/cluster.init \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "advertiseAddr": "192.168.1.10"
  }'

Deploying Services to Swarm

Using Docker Compose with Swarm

Dokploy supports Docker Compose with Swarm-specific configuration:

docker-compose.yml

version: "3.8"

services:
  web:
    image: nginx:latest
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
      placement:
        constraints:
          - node.role == worker
        preferences:
          - spread: node.labels.zone
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M
    ports:
      - "80:80"
    networks:
      - webnet

networks:
  webnet:
    driver: overlay

Deploy Modes

Replicated Mode (Default)
Global Mode

Runs a specified number of replicas across available nodes:

deploy:
  mode: replicated
  replicas: 5

Runs exactly one container per node:

deploy:
  mode: global

Useful for monitoring agents or logging sidecars.

Service Scaling

docker service scale web=5

Placement Constraints

Control where services run using constraints:

Node Attributes

deploy:
  placement:
    constraints:
      - node.role == worker              # Only worker nodes
      - node.hostname == worker-1        # Specific node
      - node.labels.environment == prod  # Custom label
      - node.platform.os == linux        # Operating system

Node Labels

Add custom labels to nodes:

docker node update --label-add environment=production worker-1
docker node update --label-add region=us-east worker-1
docker node update --label-add disktype=ssd worker-2

Use labels in placement:

deploy:
  placement:
    constraints:
      - node.labels.disktype == ssd
      - node.labels.region == us-east

Update Strategies

Configure zero-downtime rolling updates:

deploy:
  update_config:
    parallelism: 2          # Update 2 containers at a time
    delay: 10s              # Wait 10s between batches
    failure_action: rollback # Rollback on failure
    monitor: 60s            # Monitor for 60s after update
    max_failure_ratio: 0.3  # Rollback if >30% fail
    order: start-first      # Start new before stopping old

Rollback Configuration

deploy:
  rollback_config:
    parallelism: 1
    delay: 5s
    failure_action: pause
    monitor: 30s
    max_failure_ratio: 0

Service Discovery and Load Balancing

Internal Load Balancing

Swarm provides DNS-based service discovery:

services:
  web:
    image: nginx
  api:
    image: myapi
    environment:
      - API_URL=http://web  # DNS resolves to all web replicas

Ingress Load Balancing

Swarm’s routing mesh distributes external traffic:

Requests to any node on the published port reach a service replica
Automatic load balancing across healthy replicas
No additional load balancer required

services:
  web:
    image: nginx
    ports:
      - "80:80"  # Published on all nodes
    deploy:
      replicas: 3

Overlay Networks

Create encrypted overlay networks for multi-node communication:

networks:
  app-network:
    driver: overlay
    encrypted: true
    attachable: true
    driver_opts:
      encrypted: "true"

Network Isolation

Each service can use multiple networks:

services:
  web:
    networks:
      - frontend
      - backend
  db:
    networks:
      - backend  # Not accessible from frontend

networks:
  frontend:
    driver: overlay
  backend:
    driver: overlay

Secrets Management

Swarm provides secure secret storage:

Create a Secret

echo "my-secret-password" | docker secret create db_password -

Use in Service

services:
  db:
    image: postgres
    secrets:
      - db_password
    environment:
      - POSTGRES_PASSWORD_FILE=/run/secrets/db_password

secrets:
  db_password:
    external: true

Secrets are:

Encrypted at rest and in transit
Only available to services that explicitly request them
Mounted as files in /run/secrets/

Node Management

Drain a Node

Prepare a node for maintenance:

docker node update --availability drain worker-1

This stops scheduling new tasks and migrates existing tasks to other nodes.

Promote/Demote Nodes

# Promote worker to manager
docker node promote worker-1

# Demote manager to worker
docker node demote manager-2

Remove a Node

Drain the Node

docker node update --availability drain worker-1

Leave Swarm (on the node)

docker swarm leave

Remove from Cluster

docker node rm worker-1

Monitoring Swarm

Service Status

# List services
docker service ls

# Inspect service
docker service ps web

# View logs
docker service logs web

Node Health

# List nodes
docker node ls

# Inspect node
docker node inspect worker-1

Using Dokploy API

curl https://your-dokploy-instance.com/api/swarm.services \
  -H "Authorization: Bearer YOUR_API_KEY"

Best Practices

Use 3 or 5 Manager Nodes

Raft consensus requires a quorum. Odd numbers provide fault tolerance:

3 managers: tolerates 1 failure
5 managers: tolerates 2 failures
7+ managers: not recommended (increased overhead)

Separate Manager and Worker Workloads

In production, dedicate managers to orchestration:

docker node update --availability drain manager-1

Use Health Checks

Define health checks for automatic recovery:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s

Implement Resource Limits

Prevent resource exhaustion:

deploy:
  resources:
    limits:
      cpus: '2'
      memory: 2G
    reservations:
      cpus: '0.5'
      memory: 512M

Use Overlay Networks with Encryption

Enable encryption for sensitive data:

networks:
  secure-net:
    driver: overlay
    driver_opts:
      encrypted: "true"

Troubleshooting

Service tasks failing to start

Check service logs: docker service logs service-name
Verify image exists and is accessible
Check resource constraints are satisfiable
Review placement constraints

Network connectivity issues

Verify overlay network is created: docker network ls
Check firewall rules (ports 2377, 7946, 4789)
Ensure nodes can communicate
Review network driver: docker network inspect network-name

Manager node not responding

Check manager quorum: docker node ls
Review manager logs: journalctl -u docker
Verify Raft consensus is healthy
Ensure sufficient manager nodes are available

Rolling update stuck

Check service logs for errors
Review update configuration
Verify health checks are passing
Consider manual rollback: docker service rollback service-name

Next Steps

Multi-Node Deployment

Learn to deploy across multiple servers

Networking

Configure advanced networking

Volumes & Storage

Manage persistent storage in Swarm

Monitoring

Monitor your Swarm cluster

Overview

Getting Started

Core Concepts

Deployments

Databases

Monitoring & Observability

Advanced Features

Configuration

CLI & Automation

Community & Support

​Overview

​What is Docker Swarm?

​Swarm Architecture

​Initializing Docker Swarm

​Manual Initialization

​Using the Cluster Router

​Deploying Services to Swarm

​Using Docker Compose with Swarm

​Deploy Modes

​Service Scaling

​Placement Constraints

​Node Attributes

​Node Labels

​Update Strategies

​Rollback Configuration

​Service Discovery and Load Balancing

​Internal Load Balancing

​Ingress Load Balancing

​Overlay Networks

​Network Isolation

​Secrets Management

​Node Management

​Drain a Node

​Promote/Demote Nodes

​Remove a Node

​Monitoring Swarm

​Service Status

​Node Health

​Using Dokploy API

​Best Practices

​Troubleshooting

​Next Steps

Multi-Node Deployment

Networking

Volumes & Storage

Monitoring

Build docs developers (and LLMs) love

Overview

What is Docker Swarm?

Swarm Architecture

Initializing Docker Swarm

Manual Initialization

Using the Cluster Router

Deploying Services to Swarm

Using Docker Compose with Swarm

Deploy Modes

Service Scaling

Placement Constraints

Node Attributes

Node Labels

Update Strategies

Rollback Configuration

Service Discovery and Load Balancing

Internal Load Balancing

Ingress Load Balancing

Overlay Networks

Network Isolation

Secrets Management

Node Management

Drain a Node

Promote/Demote Nodes

Remove a Node

Monitoring Swarm

Service Status

Node Health

Using Dokploy API

Best Practices

Troubleshooting

Next Steps