Scaling Services

Scale your services horizontally by running multiple container replicas across your cluster. Uncloud automatically distributes containers across machines and load balances traffic between them.

Quick start

Scale a service to 5 replicas:

uc service scale web 5

Uncloud spreads the 5 containers across available machines in your cluster.

Scaling methods

You can scale services in two ways:

Using uc service scale

Scale an existing service:

# Scale to specific number of replicas
uc service scale web 10

# Scale down
uc service scale web 3

# Use service name or ID
uc service scale 9a8b7c6d5e4f3a2b 5

The command:

Inspects the current service state
Plans the changes (adding or removing containers)
Asks for confirmation when scaling down
Executes the deployment with rolling updates

Using Compose files

Set replicas in your compose.yaml:

services:
  web:
    image: nginx
    scale: 5  # Run 5 replicas

Or use the deploy.replicas syntax:

services:
  web:
    image: nginx
    deploy:
      replicas: 5

Deploy with:

uc deploy

How scaling works

Horizontal scaling

Uncloud distributes containers across available machines:

1 machine, 5 replicas: All 5 containers run on the single machine
3 machines, 5 replicas: Containers spread evenly (2, 2, 1 distribution)
5 machines, 5 replicas: One container per machine

Round-robin distribution

Uncloud uses a round-robin approach to spread containers:

services:
  web:
    image: nginx
    scale: 6

With 3 machines (machine-1, machine-2, machine-3):

Container 1 → machine-1
Container 2 → machine-2
Container 3 → machine-3
Container 4 → machine-1
Container 5 → machine-2
Container 6 → machine-3

Result: 2 containers per machine.

Scaling up

When you increase replicas:

Plan deployment

Uncloud calculates how many new containers to add and where to place them.

Start new containers

New containers start on selected machines.

Health monitoring

Each new container is monitored for health before continuing.

Add to load balancer

Healthy containers are added to Caddy’s load balancing pool.

Scaling down

When you decrease replicas:

Confirm scale down

Uncloud shows the plan and asks for confirmation (to prevent accidental data loss).

Remove from load balancer

Containers are removed from Caddy configuration.

Stop containers

Containers are gracefully stopped (SIGTERM, then SIGKILL after grace period).

Remove containers

Stopped containers are removed.

Scaling to zero is not supported. Use uc rm <service> to remove a service entirely.

Load balancing

Uncloud automatically load balances traffic across replicas.

HTTP/HTTPS traffic

Caddy reverse proxy distributes requests to all healthy containers:

services:
  web:
    image: myapp:latest
    scale: 5
    x-ports:
      - app.example.com:8000/https

Caddy configuration (auto-generated):

https://app.example.com {
    reverse_proxy 10.210.0.3:8000 10.210.1.4:8000 10.210.2.5:8000 10.210.0.6:8000 10.210.1.7:8000 {
        import common_proxy
    }
    log
}

Features:

Round-robin load balancing
Automatic health checking (passive and active)
Failed requests retry on other upstreams
Unhealthy containers automatically removed

Internal service discovery

Services communicate via DNS names that resolve to all healthy container IPs:

services:
  api:
    image: api:latest
    scale: 3

  web:
    image: web:latest
    environment:
      # Resolves to all 3 API container IPs
      API_URL: http://api:8000

The api hostname returns all 3 container IPs. Your HTTP client (like curl, axios, fetch) handles load distribution.

Placement constraints

Control which machines can run your service containers.

Using x-machines extension

Restrict service to specific machines:

services:
  web:
    image: nginx
    scale: 6
    x-machines:
      - us-east-1
      - us-east-2
      - eu-west-1

The 6 replicas spread evenly across these 3 machines (2 per machine).

Pin to single machine

For stateful services that can’t run across machines:

services:
  database:
    image: postgres:16
    x-machines: db-server
    volumes:
      - db-data:/var/lib/postgresql/data

volumes:
  db-data:

Short syntax for single machine placement.

Geographic distribution

services:
  web:
    image: myapp:latest
    scale: 6
    x-machines:
      - us-east-1    # 2 replicas
      - us-east-2    # 2 replicas
      - eu-west-1    # 2 replicas
    x-ports:
      - app.example.com:8000/https

Users are served by the closest healthy container (via Caddy’s load balancing).

Scaling strategies

Auto-scaling (manual)

Uncloud doesn’t have built-in auto-scaling yet. Scale manually based on metrics:

# Monitor CPU/memory usage
uc ps

# Scale up when usage is high
uc service scale web 10

# Scale down when usage is low
uc service scale web 5

Scheduled scaling

Use cron jobs for time-based scaling:

# Scale up during business hours (9 AM)
0 9 * * * uc service scale web 10

# Scale down at night (6 PM)
0 18 * * * uc service scale web 3

Blue-green deployments

Deploy new version alongside old version:

services:
  # Old version (blue)
  web-blue:
    image: myapp:v1
    scale: 5
    x-ports:
      - app.example.com:8000/https

  # New version (green)
  web-green:
    image: myapp:v2
    scale: 5
    x-ports:
      - app-staging.example.com:8000/https

Test the green deployment, then switch traffic:

services:
  web-green:
    image: myapp:v2
    scale: 5
    x-ports:
      - app.example.com:8000/https  # Switch to production

  web-blue:
    image: myapp:v1
    scale: 0  # Scale down old version

Scaling with volumes

Shared volumes (read-only)

Multiple replicas can safely read from the same volume:

services:
  web:
    image: nginx
    scale: 5
    volumes:
      # Read-only shared config
      - /etc/nginx/conf:/etc/nginx/conf:ro

Shared volumes (read-write)

Be careful with concurrent writes:

services:
  app:
    image: myapp:latest
    scale: 3
    volumes:
      # Shared uploads directory
      - uploads:/app/uploads

volumes:
  uploads:

Ensure your application handles concurrent file access safely.

Per-replica volumes

For data that shouldn’t be shared, use external volumes per machine:

services:
  worker:
    image: worker:latest
    scale: 3
    x-machines:
      - worker-1
      - worker-2
      - worker-3
    volumes:
      # Each machine has its own cache volume
      - cache:/tmp/cache

volumes:
  cache:

Global services

Run exactly one container on every machine:

services:
  monitoring-agent:
    image: prometheus/node-exporter
    deploy:
      mode: global

Global services automatically scale as you add/remove machines:

Add a machine → new container starts
Remove a machine → container is removed

Global service use cases

Monitoring agents: Collect metrics from each machine
Log collectors: Forward logs from each machine
Local caches: Provide caching on every machine
Network tools: DNS resolvers, proxies

Scaling best practices

Start small, scale up

Begin with fewer replicas and scale up based on actual load:

services:
  web:
    image: myapp:latest
    scale: 2  # Start with 2

Monitor performance and increase as needed.

Distribute across machines

Use multiple machines for high availability:

services:
  web:
    image: myapp:latest
    scale: 6
    x-machines:
      - machine-1
      - machine-2
      - machine-3

If one machine fails, others continue serving traffic.

Configure health checks

Health checks ensure only working containers receive traffic:

services:
  web:
    image: myapp:latest
    scale: 5
    healthcheck:
      test: curl -f http://localhost/health
      interval: 10s
      retries: 3

Plan for failure

Run enough replicas to handle machine failures:

3 machines → at least 4-6 replicas
One machine fails → remaining replicas handle the load

Monitor resource usage

Check CPU and memory before scaling:

uc ps  # Shows resource usage

If containers are underutilized, you might not need more replicas.

Scaling limitations

Cannot scale to zero

Uncloud doesn’t support scaling to 0 replicas:

# This fails
uc service scale web 0

# Error: scaling to zero replicas is not supported
# Use 'uc rm web' instead

Reason: Uncloud derives service configuration from running containers. With zero containers, there’s no configuration to restore when scaling back up.

Global services cannot be scaled

Global services always run one replica per machine:

services:
  agent:
    image: monitoring-agent
    deploy:
      mode: global
    # Cannot set scale or replicas

Volumes and scaling

Shared volumes with multiple replicas:

Read-only: Safe to scale freely
Read-write: Ensure application handles concurrent access
Database volumes: Don’t scale beyond 1 replica unless using clustering

Real-world examples

Scale web application

services:
  web:
    build: .
    scale: 5
    x-ports:
      - app.example.com:8000/https
    environment:
      DATABASE_URL: postgres://db:5432/myapp
      REDIS_URL: redis://cache:6379
    healthcheck:
      test: curl -f http://localhost:8000/health
      interval: 10s

  db:
    image: postgres:16
    x-machines: db-server
    volumes:
      - db-data:/var/lib/postgresql/data

  cache:
    image: redis:alpine
    x-machines: cache-server

volumes:
  db-data:

Scale the web tier horizontally, keep database and cache on dedicated servers.

Geographic distribution

services:
  api:
    image: api:latest
    scale: 6
    x-machines:
      - us-east-1
      - us-east-2
      - us-west-1
      - eu-west-1
      - ap-south-1
      - ap-southeast-1
    x-ports:
      - api.example.com:8000/https
    healthcheck:
      test: curl -f http://localhost:8000/health

One replica per region for low latency worldwide.

Background workers

services:
  worker:
    image: worker:latest
    scale: 10
    environment:
      QUEUE_URL: redis://queue:6379
      CONCURRENCY: 4

  queue:
    image: redis:alpine
    x-machines: queue-server

Scale workers independently of web tier.

Monitoring scaled services

List all containers

uc ps

Shows all containers across all machines with their status.

Inspect service

uc inspect web

Shows:

Service ID and name
Number of replicas
Container locations
Health status
Endpoints

Check logs

# Logs from all replicas
uc logs web

# Follow logs in real-time
uc logs -f web

# Logs from specific container
uc logs <container-id>

View Caddy upstreams

uc caddy config

Shows the generated Caddyfile with all upstream container IPs.

Next steps

Rolling Updates

Update scaled services with zero downtime

Health Checks

Ensure only healthy containers receive traffic

Docker Compose

Define multi-service applications

Deploying Services

Learn about service deployment basics

Get Started

Core Concepts

Deployment

Operations

Advanced

​Quick start

​Scaling methods

​Using uc service scale

​Using Compose files

​How scaling works

​Horizontal scaling

​Round-robin distribution

​Scaling up

​Scaling down

​Load balancing

​HTTP/HTTPS traffic

​Internal service discovery

​Placement constraints

​Using x-machines extension

​Pin to single machine

​Geographic distribution

​Scaling strategies

​Auto-scaling (manual)

​Scheduled scaling

​Blue-green deployments

​Scaling with volumes

​Shared volumes (read-only)

​Shared volumes (read-write)

​Per-replica volumes

​Global services

​Global service use cases

​Scaling best practices

​Scaling limitations

​Cannot scale to zero

​Global services cannot be scaled

​Volumes and scaling

​Real-world examples

​Scale web application

​Geographic distribution

​Background workers

​Monitoring scaled services

​List all containers

​Inspect service

​Check logs

​View Caddy upstreams

​Next steps

Rolling Updates

Health Checks

Docker Compose

Deploying Services

Build docs developers (and LLMs) love

Quick start

Scaling methods

Using uc service scale

Using Compose files

How scaling works

Horizontal scaling

Round-robin distribution

Scaling up

Scaling down

Load balancing

HTTP/HTTPS traffic

Internal service discovery

Placement constraints

Using x-machines extension

Pin to single machine

Geographic distribution

Scaling strategies

Auto-scaling (manual)

Scheduled scaling

Blue-green deployments

Scaling with volumes

Shared volumes (read-only)

Shared volumes (read-write)

Per-replica volumes

Global services

Global service use cases

Scaling best practices

Scaling limitations

Cannot scale to zero

Global services cannot be scaled

Volumes and scaling

Real-world examples

Scale web application

Geographic distribution

Background workers

Monitoring scaled services

List all containers

Inspect service

Check logs

View Caddy upstreams

Next steps