Skip to main content
The deploy section configures how services are deployed across your cluster. This includes replica count, resource limits, placement, and update behavior.

Deployment mode

Services can run in replicated or global mode:
services:
  # Replicated: specific number of containers
  web:
    image: nginx
    deploy:
      mode: replicated
      replicas: 3

  # Global: one container per machine
  monitoring:
    image: prometheus/node-exporter
    deploy:
      mode: global
Modes:
  • replicated - Run a specific number of replicas (default)
  • global - Run one container on every machine in the cluster
References: pkg/api/service.go:19-21, pkg/client/compose/service.go:98-107

Replicas

Set the number of container replicas for replicated services:
services:
  api:
    image: myapi
    deploy:
      replicas: 5  # Run 5 containers
You can also use the top-level scale key:
services:
  worker:
    image: worker
    scale: 10  # Shorthand for deploy.replicas
If both scale and deploy.replicas are set, deploy.replicas takes precedence.
Default is 1 replica if not specified. References: pkg/api/service.go:70-71, pkg/client/compose/service.go:93-104

Placement constraints

Control which machines can run your service using the x-machines extension:
services:
  gpu-worker:
    image: ml-worker
    x-machines:
      - gpu-machine-1
      - gpu-machine-2
    deploy:
      replicas: 4  # Distributed across specified machines
You can specify machine names or IDs:
services:
  db:
    image: postgres:16
    x-machines: db-server  # Single machine (string)
    deploy:
      replicas: 1
Format:
  • String: x-machines: machine-name (single machine)
  • Array: x-machines: [machine-1, machine-2] (multiple machines)
If a service has replicas > 1 and uses x-machines, Uncloud spreads replicas across the specified machines.
References: pkg/api/service.go:65-66, pkg/client/compose/service.go:76-78, test/e2e/compose_deploy_test.go:390-489

Update configuration

Configure rolling update behavior:
services:
  web:
    image: nginx
    deploy:
      replicas: 4
      update_config:
        order: start-first
        monitor: 30s
References: pkg/api/service.go:441-452

Update order

Control the order of container replacement during updates:
services:
  # Stateless service: minimize downtime
  api:
    image: myapi
    deploy:
      update_config:
        order: start-first  # Start new before stopping old

  # Stateful service: prevent data corruption
  db:
    image: postgres:16
    volumes:
      - db-data:/var/lib/postgresql/data
    deploy:
      update_config:
        order: stop-first  # Stop old before starting new
Update orders:
  • start-first - Start new container before stopping old (default for stateless)
    • Minimizes downtime
    • Briefly runs both containers
  • stop-first - Stop old container before starting new (default for services with volumes)
    • Prevents data corruption
    • Causes brief downtime
References: pkg/api/service.go:23-28, pkg/client/compose/service.go:109-120

Monitor period

How long to monitor containers after starting to ensure they’re healthy:
services:
  app:
    image: myapp
    healthcheck:
      test: ["CMD", "curl", "http://localhost/health"]
      interval: 10s
    deploy:
      update_config:
        monitor: 60s  # Wait up to 60s for health check
The update waits for:
  1. Container to pass health check, or
  2. Monitor period to elapse and container still running
Default monitor period is used if not specified. References: pkg/api/service.go:447-451, pkg/client/compose/service.go:121-123

Resource limits

Set CPU and memory limits for containers:
services:
  app:
    image: myapp
    deploy:
      resources:
        limits:
          cpus: '1.5'        # 1.5 CPU cores
          memory: 1G         # 1 gigabyte RAM
        reservations:
          memory: 512M       # Reserve 512 megabytes
You can also use top-level keys:
services:
  app:
    image: myapp
    cpus: 0.5           # Shorthand for deploy.resources.limits.cpus
    mem_limit: 256M     # Shorthand for deploy.resources.limits.memory
    mem_reservation: 128M  # Shorthand for deploy.resources.reservations.memory
References: pkg/client/compose/service.go:175-228

CPU limits

Limit CPU usage:
services:
  cpu-intensive:
    image: myapp
    deploy:
      resources:
        limits:
          cpus: '2.0'  # Use up to 2 CPU cores
Format: String or number representing CPU cores (e.g., '0.5', '2.0')

Memory limits

Limit memory usage:
services:
  memory-intensive:
    image: myapp
    deploy:
      resources:
        limits:
          memory: 2G         # 2 gigabytes maximum
        reservations:
          memory: 1G         # 1 gigabyte guaranteed
Formats: 100M, 1G, 512m, etc.
If a container exceeds its memory limit, Docker kills and restarts it.

Device reservations

Reserve GPU and other devices:
services:
  ml-worker:
    image: tensorflow/tensorflow:latest-gpu
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1          # Number of GPUs
              capabilities: [gpu]
            
            - driver: nvidia
              device_ids: ['0', '2']  # Specific GPU IDs
              capabilities: [gpu]
References: pkg/client/compose/service.go:204-226

Ulimits

Set user limits for containers:
services:
  app:
    image: myapp
    ulimits:
      nofile:           # File descriptors
        soft: 20000
        hard: 40000
      nproc: 65535      # Processes
References: pkg/client/compose/service.go:370-395

Restart policy

Uncloud automatically restarts containers with the unless-stopped policy. Custom restart policies are not currently supported.
All containers restart automatically unless you explicitly stop the service.
References: website/docs/8-compose-file-reference/1-support-matrix.md:50

Complete example

services:
  # High-availability web service
  web:
    image: nginx:alpine
    x-ports:
      - example.com:80/https
    deploy:
      mode: replicated
      replicas: 5
      update_config:
        order: start-first  # Minimize downtime
        monitor: 30s
      resources:
        limits:
          cpus: '0.25'
          memory: 128M
        reservations:
          memory: 64M

  # API service with placement constraints
  api:
    image: myapi:latest
    x-machines:
      - api-server-1
      - api-server-2
    deploy:
      mode: replicated
      replicas: 6  # 3 per machine
      update_config:
        order: start-first
        monitor: 60s
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
        reservations:
          memory: 512M

  # Stateful database
  db:
    image: postgres:16
    volumes:
      - db-data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=secret
    x-machines: db-server  # Pin to specific machine
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        order: stop-first  # Prevent data corruption
        monitor: 120s
      resources:
        limits:
          cpus: '2.0'
          memory: 4G
        reservations:
          memory: 2G

  # Background worker
  worker:
    image: worker:latest
    deploy:
      mode: replicated
      replicas: 10
      resources:
        limits:
          cpus: '0.5'
          memory: 512M

  # GPU machine learning worker
  ml-worker:
    image: tensorflow/tensorflow:latest-gpu
    x-machines:
      - gpu-machine
    deploy:
      mode: replicated
      replicas: 2
      resources:
        limits:
          cpus: '4.0'
          memory: 8G
        reservations:
          memory: 4G
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  # Monitoring on every machine
  node-exporter:
    image: prom/node-exporter
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
    deploy:
      mode: global  # One per machine
      resources:
        limits:
          cpus: '0.1'
          memory: 64M

volumes:
  db-data:

Build docs developers (and LLMs) love