Skip to main content
Scale your services horizontally by running multiple container replicas across your cluster. Uncloud automatically distributes containers across machines and load balances traffic between them.

Quick start

Scale a service to 5 replicas:
uc service scale web 5
Uncloud spreads the 5 containers across available machines in your cluster.

Scaling methods

You can scale services in two ways:

Using uc service scale

Scale an existing service:
# Scale to specific number of replicas
uc service scale web 10

# Scale down
uc service scale web 3

# Use service name or ID
uc service scale 9a8b7c6d5e4f3a2b 5
The command:
  1. Inspects the current service state
  2. Plans the changes (adding or removing containers)
  3. Asks for confirmation when scaling down
  4. Executes the deployment with rolling updates

Using Compose files

Set replicas in your compose.yaml:
services:
  web:
    image: nginx
    scale: 5  # Run 5 replicas
Or use the deploy.replicas syntax:
services:
  web:
    image: nginx
    deploy:
      replicas: 5
Deploy with:
uc deploy

How scaling works

Horizontal scaling

Uncloud distributes containers across available machines:
  • 1 machine, 5 replicas: All 5 containers run on the single machine
  • 3 machines, 5 replicas: Containers spread evenly (2, 2, 1 distribution)
  • 5 machines, 5 replicas: One container per machine

Round-robin distribution

Uncloud uses a round-robin approach to spread containers:
services:
  web:
    image: nginx
    scale: 6
With 3 machines (machine-1, machine-2, machine-3):
  1. Container 1 → machine-1
  2. Container 2 → machine-2
  3. Container 3 → machine-3
  4. Container 4 → machine-1
  5. Container 5 → machine-2
  6. Container 6 → machine-3
Result: 2 containers per machine.

Scaling up

When you increase replicas:
1

Plan deployment

Uncloud calculates how many new containers to add and where to place them.
2

Start new containers

New containers start on selected machines.
3

Health monitoring

Each new container is monitored for health before continuing.
4

Add to load balancer

Healthy containers are added to Caddy’s load balancing pool.

Scaling down

When you decrease replicas:
1

Confirm scale down

Uncloud shows the plan and asks for confirmation (to prevent accidental data loss).
2

Remove from load balancer

Containers are removed from Caddy configuration.
3

Stop containers

Containers are gracefully stopped (SIGTERM, then SIGKILL after grace period).
4

Remove containers

Stopped containers are removed.
Scaling to zero is not supported. Use uc rm <service> to remove a service entirely.

Load balancing

Uncloud automatically load balances traffic across replicas.

HTTP/HTTPS traffic

Caddy reverse proxy distributes requests to all healthy containers:
services:
  web:
    image: myapp:latest
    scale: 5
    x-ports:
      - app.example.com:8000/https
Caddy configuration (auto-generated):
https://app.example.com {
    reverse_proxy 10.210.0.3:8000 10.210.1.4:8000 10.210.2.5:8000 10.210.0.6:8000 10.210.1.7:8000 {
        import common_proxy
    }
    log
}
Features:
  • Round-robin load balancing
  • Automatic health checking (passive and active)
  • Failed requests retry on other upstreams
  • Unhealthy containers automatically removed

Internal service discovery

Services communicate via DNS names that resolve to all healthy container IPs:
services:
  api:
    image: api:latest
    scale: 3

  web:
    image: web:latest
    environment:
      # Resolves to all 3 API container IPs
      API_URL: http://api:8000
The api hostname returns all 3 container IPs. Your HTTP client (like curl, axios, fetch) handles load distribution.

Placement constraints

Control which machines can run your service containers.

Using x-machines extension

Restrict service to specific machines:
services:
  web:
    image: nginx
    scale: 6
    x-machines:
      - us-east-1
      - us-east-2
      - eu-west-1
The 6 replicas spread evenly across these 3 machines (2 per machine).

Pin to single machine

For stateful services that can’t run across machines:
services:
  database:
    image: postgres:16
    x-machines: db-server
    volumes:
      - db-data:/var/lib/postgresql/data

volumes:
  db-data:
Short syntax for single machine placement.

Geographic distribution

services:
  web:
    image: myapp:latest
    scale: 6
    x-machines:
      - us-east-1    # 2 replicas
      - us-east-2    # 2 replicas
      - eu-west-1    # 2 replicas
    x-ports:
      - app.example.com:8000/https
Users are served by the closest healthy container (via Caddy’s load balancing).

Scaling strategies

Auto-scaling (manual)

Uncloud doesn’t have built-in auto-scaling yet. Scale manually based on metrics:
# Monitor CPU/memory usage
uc ps

# Scale up when usage is high
uc service scale web 10

# Scale down when usage is low
uc service scale web 5

Scheduled scaling

Use cron jobs for time-based scaling:
# Scale up during business hours (9 AM)
0 9 * * * uc service scale web 10

# Scale down at night (6 PM)
0 18 * * * uc service scale web 3

Blue-green deployments

Deploy new version alongside old version:
services:
  # Old version (blue)
  web-blue:
    image: myapp:v1
    scale: 5
    x-ports:
      - app.example.com:8000/https

  # New version (green)
  web-green:
    image: myapp:v2
    scale: 5
    x-ports:
      - app-staging.example.com:8000/https
Test the green deployment, then switch traffic:
services:
  web-green:
    image: myapp:v2
    scale: 5
    x-ports:
      - app.example.com:8000/https  # Switch to production

  web-blue:
    image: myapp:v1
    scale: 0  # Scale down old version

Scaling with volumes

Shared volumes (read-only)

Multiple replicas can safely read from the same volume:
services:
  web:
    image: nginx
    scale: 5
    volumes:
      # Read-only shared config
      - /etc/nginx/conf:/etc/nginx/conf:ro

Shared volumes (read-write)

Be careful with concurrent writes:
services:
  app:
    image: myapp:latest
    scale: 3
    volumes:
      # Shared uploads directory
      - uploads:/app/uploads

volumes:
  uploads:
Ensure your application handles concurrent file access safely.

Per-replica volumes

For data that shouldn’t be shared, use external volumes per machine:
services:
  worker:
    image: worker:latest
    scale: 3
    x-machines:
      - worker-1
      - worker-2
      - worker-3
    volumes:
      # Each machine has its own cache volume
      - cache:/tmp/cache

volumes:
  cache:

Global services

Run exactly one container on every machine:
services:
  monitoring-agent:
    image: prometheus/node-exporter
    deploy:
      mode: global
Global services automatically scale as you add/remove machines:
  • Add a machine → new container starts
  • Remove a machine → container is removed

Global service use cases

  • Monitoring agents: Collect metrics from each machine
  • Log collectors: Forward logs from each machine
  • Local caches: Provide caching on every machine
  • Network tools: DNS resolvers, proxies

Scaling best practices

Begin with fewer replicas and scale up based on actual load:
services:
  web:
    image: myapp:latest
    scale: 2  # Start with 2
Monitor performance and increase as needed.
Use multiple machines for high availability:
services:
  web:
    image: myapp:latest
    scale: 6
    x-machines:
      - machine-1
      - machine-2
      - machine-3
If one machine fails, others continue serving traffic.
Health checks ensure only working containers receive traffic:
services:
  web:
    image: myapp:latest
    scale: 5
    healthcheck:
      test: curl -f http://localhost/health
      interval: 10s
      retries: 3
Run enough replicas to handle machine failures:
  • 3 machines → at least 4-6 replicas
  • One machine fails → remaining replicas handle the load
Check CPU and memory before scaling:
uc ps  # Shows resource usage
If containers are underutilized, you might not need more replicas.

Scaling limitations

Cannot scale to zero

Uncloud doesn’t support scaling to 0 replicas:
# This fails
uc service scale web 0

# Error: scaling to zero replicas is not supported
# Use 'uc rm web' instead
Reason: Uncloud derives service configuration from running containers. With zero containers, there’s no configuration to restore when scaling back up.

Global services cannot be scaled

Global services always run one replica per machine:
services:
  agent:
    image: monitoring-agent
    deploy:
      mode: global
    # Cannot set scale or replicas

Volumes and scaling

Shared volumes with multiple replicas:
  • Read-only: Safe to scale freely
  • Read-write: Ensure application handles concurrent access
  • Database volumes: Don’t scale beyond 1 replica unless using clustering

Real-world examples

Scale web application

services:
  web:
    build: .
    scale: 5
    x-ports:
      - app.example.com:8000/https
    environment:
      DATABASE_URL: postgres://db:5432/myapp
      REDIS_URL: redis://cache:6379
    healthcheck:
      test: curl -f http://localhost:8000/health
      interval: 10s

  db:
    image: postgres:16
    x-machines: db-server
    volumes:
      - db-data:/var/lib/postgresql/data

  cache:
    image: redis:alpine
    x-machines: cache-server

volumes:
  db-data:
Scale the web tier horizontally, keep database and cache on dedicated servers.

Geographic distribution

services:
  api:
    image: api:latest
    scale: 6
    x-machines:
      - us-east-1
      - us-east-2
      - us-west-1
      - eu-west-1
      - ap-south-1
      - ap-southeast-1
    x-ports:
      - api.example.com:8000/https
    healthcheck:
      test: curl -f http://localhost:8000/health
One replica per region for low latency worldwide.

Background workers

services:
  worker:
    image: worker:latest
    scale: 10
    environment:
      QUEUE_URL: redis://queue:6379
      CONCURRENCY: 4

  queue:
    image: redis:alpine
    x-machines: queue-server
Scale workers independently of web tier.

Monitoring scaled services

List all containers

uc ps
Shows all containers across all machines with their status.

Inspect service

uc inspect web
Shows:
  • Service ID and name
  • Number of replicas
  • Container locations
  • Health status
  • Endpoints

Check logs

# Logs from all replicas
uc logs web

# Follow logs in real-time
uc logs -f web

# Logs from specific container
uc logs <container-id>

View Caddy upstreams

uc caddy config
Shows the generated Caddyfile with all upstream container IPs.

Next steps

Rolling Updates

Update scaled services with zero downtime

Health Checks

Ensure only healthy containers receive traffic

Docker Compose

Define multi-service applications

Deploying Services

Learn about service deployment basics

Build docs developers (and LLMs) love