Rolling Updates

Uncloud updates your services without downtime by replacing containers one at a time. Each new container must pass health checks before the deployment continues, ensuring your service stays available throughout the update.

How rolling updates work

When you run uc deploy, Uncloud updates containers sequentially:

Start new container

Launch a container with the new configuration or image.

Wait for health

Monitor the container for 5 seconds (or until it becomes healthy).

Stop old container

Once the new container is healthy, stop and remove the old one.

Repeat

Move to the next container and repeat the process.

For a service with 3 replicas using the default start-first order:

Start new container 1, wait until healthy
Stop and remove old container 1
Start new container 2, wait until healthy
Stop and remove old container 2
Start new container 3, wait until healthy
Stop and remove old container 3

At every step, at least 3 containers are serving traffic.

Update order

The update order controls whether the new container starts before or after stopping the old one.

start-first (default)

Start the new container before stopping the old one:

Pros: Zero downtime, service always available
Cons: Briefly runs both containers simultaneously
Best for: Stateless services (web apps, APIs, workers)

services:
  web:
    image: myapp:v2
    deploy:
      update_config:
        order: start-first

stop-first

Stop the old container before starting the new one:

Pros: Only one container runs at a time, prevents data corruption
Cons: Brief downtime during the transition
Best for: Stateful services (databases, services with volumes)

services:
  db:
    image: postgres:16
    volumes:
      - db-data:/var/lib/postgresql/data
    deploy:
      update_config:
        order: stop-first

volumes:
  db-data:

Automatic order selection

Uncloud automatically chooses the update order based on your service configuration:

Scenario	Order	Reason
Default	`start-first`	Minimize downtime
Host port conflicts	`stop-first`	Old container must free the port
Single replica + volume	`stop-first`	Prevent concurrent writes to same volume
Multi-replica + volume	`start-first`	Concurrent access already happening

The deployment plan shows which order will be used for each container.

Health monitoring

After starting each new container, Uncloud monitors it for failures.

Default monitoring period

By default, Uncloud waits 5 seconds and checks that the container:

Stays running (doesn’t crash)
Doesn’t restart repeatedly
Becomes healthy if it has a health check

If the container fails during this period, Uncloud rolls back and stops the deployment.

Change monitoring period

Adjust the monitoring period for your service:

services:
  app:
    image: myapp:latest
    deploy:
      update_config:
        # Wait 10 seconds before considering the container healthy
        monitor: 10s

Use a longer period if your app takes time to initialize.

Skip monitoring

Set to 0s to skip monitoring entirely:

services:
  app:
    image: myapp:latest
    deploy:
      update_config:
        monitor: 0s  # Skip health monitoring

Or use the --skip-health flag:

uc deploy --skip-health

Skipping health monitoring won’t detect containers that crash on startup. Use only for emergency deployments when you’re confident the new version works.

Global default

Change the default monitoring period for all services:

export UNCLOUD_HEALTH_MONITOR_PERIOD=10s
uc deploy

Per-service monitor settings override the global default.

Health checks

Configure health checks to make deployments safer and faster.

Why use health checks

Faster deployments: Container marked healthy as soon as the check passes
Safer rollouts: Detect broken deployments before they affect traffic
Automatic recovery: Unhealthy containers removed from load balancing

Configure in Compose file

services:
  app:
    image: myapp:latest
    healthcheck:
      # Command to check health
      test: curl -f http://localhost:8000/health || exit 1
      # Check every 5 seconds
      interval: 5s
      # Timeout for each check
      timeout: 3s
      # Number of consecutive failures before unhealthy
      retries: 3
      # Wait 10s after container starts before first check
      start_period: 10s
      # Check every 1s during start_period
      start_interval: 1s

Configure in Dockerfile

FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .

HEALTHCHECK --interval=10s --timeout=3s --start-period=30s \
  CMD node healthcheck.js || exit 1

CMD ["node", "server.js"]

Health check behavior during deployment

Container starts
Health check runs according to interval (or start_interval during start_period)
If container becomes healthy before monitoring period ends, deployment succeeds early
If container is unhealthy after monitoring period, Uncloud rolls back
Transient unhealthy states during monitoring are tolerated

Health check formats

Three formats are supported: Shell command:

healthcheck:
  test: curl -f http://localhost/health || exit 1

Exec array:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost/health"]

Disable inherited health check:

healthcheck:
  disable: true

Example health check endpoints

Node.js (Express):

app.get('/health', (req, res) => {
  // Check database connection
  if (!db.isConnected()) {
    return res.status(503).send('Database unavailable');
  }
  res.status(200).send('OK');
});

Python (FastAPI):

@app.get("/health")
async def health():
    # Check dependencies
    if not await redis.ping():
        raise HTTPException(status_code=503, detail="Redis unavailable")
    return {"status": "healthy"}

Go:

http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
    // Check database connection
    if err := db.Ping(); err != nil {
        w.WriteHeader(http.StatusServiceUnavailable)
        return
    }
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("OK"))
})

Rollback on failure

If a new container fails health monitoring, Uncloud automatically rolls back:

Stop new container

The failed container is stopped but kept for inspection.

Restart old container (stop-first only)

For stop-first order, the old container is restarted.

Halt deployment

The deployment stops. Remaining containers keep their current version.

Inspect failed containers

After a failed deployment:

# View all containers including stopped ones
uc ps -a

# Check logs of failed container
uc logs <container-id>

# Inspect container state
uc inspect <container-id>

Retry after failure

Fix the issue and run uc deploy again. Uncloud skips successfully updated containers and only redeploys the remaining ones.

Update strategies for different scenarios

Stateless web application

services:
  web:
    image: myapp:v2
    scale: 5
    x-ports:
      - app.example.com:8000/https
    healthcheck:
      test: curl -f http://localhost:8000/health
      interval: 5s
      retries: 3
    # start-first is automatic (default)

Result: Zero downtime, all 5 replicas updated one by one.

Stateful database

services:
  postgres:
    image: postgres:16
    volumes:
      - db-data:/var/lib/postgresql/data
    deploy:
      update_config:
        order: stop-first  # Prevent data corruption

volumes:
  db-data:

Result: Brief downtime while old container stops and new one starts.

Single replica with volume

services:
  app:
    image: myapp:v2
    scale: 1
    volumes:
      - app-data:/data
    # stop-first is automatic (single replica + volume)

volumes:
  app-data:

Result: Automatic stop-first to prevent concurrent writes to volume.

Multi-replica with volume (safe concurrent access)

services:
  app:
    image: myapp:v2
    scale: 3
    volumes:
      - shared-data:/data
    deploy:
      update_config:
        order: start-first  # Override automatic stop-first

volumes:
  shared-data:

Result: Zero downtime if your app handles concurrent access safely (e.g., SQLite with WAL mode).

Host port binding

services:
  metrics:
    image: prometheus:latest
    x-ports:
      - 9090:9090/tcp@host
    # stop-first is automatic (port conflict)

Result: Automatic stop-first because old container must free port 9090 before new one can bind.

ServiceSpec and UpdateConfig reference

Based on the source code, here are the key structures:

ServiceSpec fields

type ServiceSpec struct {
    Name             string
    Mode             string           // "replicated" or "global"
    Replicas         uint             // For replicated mode
    Container        ContainerSpec
    Ports            []PortSpec
    Volumes          []VolumeSpec
    UpdateConfig     UpdateConfig
    StopGracePeriod  *time.Duration   // Default: 10 seconds
    // ... other fields
}

UpdateConfig fields

type UpdateConfig struct {
    // Order: "start-first" or "stop-first"
    // Empty means automatic selection based on service characteristics
    Order string `json:",omitempty"`

    // MonitorPeriod: How long to wait after starting a container
    // before checking it's still running
    // nil = use default (5s), 0 = skip monitoring
    MonitorPeriod *time.Duration `json:",omitempty"`
}

Stop grace period

Time to wait after SIGTERM before sending SIGKILL:

services:
  app:
    image: myapp:latest
    stop_grace_period: 30s  # Wait 30s for graceful shutdown

Default is 10 seconds if not specified.

Advanced scenarios

Canary deployments

Deploy to a subset of machines first:

services:
  web-canary:
    image: myapp:v2
    scale: 1
    x-machines:
      - canary-server

  web:
    image: myapp:v1
    scale: 5
    x-machines:
      - server-1
      - server-2
      - server-3
      - server-4
      - server-5

Monitor the canary, then update the main service if successful.

Forced recreation

Force container recreation even if nothing changed:

uc deploy --recreate

Useful for:

Picking up external volume changes
Resetting container state
Testing deployment process

Emergency rollback

Quickly roll back to previous version:

# Update image tag to previous version
uc deploy -f compose.v1.yaml --skip-health

The --skip-health flag speeds up the rollback but skips health checks.

Monitoring deployments

Watch deployment progress

uc deploy shows real-time progress:

Deployment plan
- Deploy service [name=web]
  - Replace container [id=a1b2c3, machine=server-1, order=start-first]
  - Replace container [id=d4e5f6, machine=server-2, order=start-first]
  - Replace container [id=g7h8i9, machine=server-3, order=start-first]

Do you want to continue? (y/N): y

Deploying services ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:45

Check service health after deployment

# List all services
uc ls

# Check specific service
uc inspect web

# View container logs
uc logs web

# Monitor in real-time
uc logs -f web

Verify all containers are healthy

uc ps

Look for the “healthy” status in the output.

Best practices

Always configure health checks

Health checks make deployments safer and faster. Every service should have one:

healthcheck:
  test: curl -f http://localhost/health || exit 1
  interval: 10s
  timeout: 3s
  retries: 3
  start_period: 30s

Use appropriate update order

Let Uncloud choose automatically unless you have specific requirements:

Stateless services: start-first (automatic)
Stateful services: stop-first (automatic for volumes)
Host ports: stop-first (automatic)

Test deployments in staging

Always test deployments in a staging environment before production:

# Deploy to staging
uc deploy -f compose.staging.yaml

# Verify everything works

# Deploy to production
uc deploy -f compose.prod.yaml

Monitor during rollout

Watch logs and metrics during deployment:

# Terminal 1: Deploy
uc deploy

# Terminal 2: Watch logs
uc logs -f web

Plan for rollback

Keep previous versions available for quick rollback:

# Tag images with version numbers
services:
  web:
    image: myapp:v1.2.3

Next steps

Health Checks

Deep dive into container health checks

Scaling

Scale services horizontally

Docker Compose

Learn about Compose file features

Deploying Services

Deploy with uc run

Get Started

Core Concepts

Deployment

Operations

Advanced

​How rolling updates work

​Update order

​start-first (default)

​stop-first

​Automatic order selection

​Health monitoring

​Default monitoring period

​Change monitoring period

​Skip monitoring

​Global default

​Health checks

​Why use health checks

​Configure in Compose file

​Configure in Dockerfile

​Health check behavior during deployment

​Health check formats

​Example health check endpoints

​Rollback on failure

​Inspect failed containers

​Retry after failure

​Update strategies for different scenarios

​Stateless web application

​Stateful database

​Single replica with volume

​Multi-replica with volume (safe concurrent access)

​Host port binding

​ServiceSpec and UpdateConfig reference

​ServiceSpec fields

​UpdateConfig fields

​Stop grace period

​Advanced scenarios

​Canary deployments

​Forced recreation

​Emergency rollback

​Monitoring deployments

​Watch deployment progress

​Check service health after deployment

​Verify all containers are healthy

​Best practices

​Next steps

Health Checks

Scaling

Docker Compose

Deploying Services

Build docs developers (and LLMs) love

How rolling updates work

Update order

start-first (default)

stop-first

Automatic order selection

Health monitoring

Default monitoring period

Change monitoring period

Skip monitoring

Global default

Health checks

Why use health checks

Configure in Compose file

Configure in Dockerfile

Health check behavior during deployment

Health check formats

Example health check endpoints

Rollback on failure

Inspect failed containers

Retry after failure

Update strategies for different scenarios

Stateless web application

Stateful database

Single replica with volume

Multi-replica with volume (safe concurrent access)

Host port binding

ServiceSpec and UpdateConfig reference

ServiceSpec fields

UpdateConfig fields

Stop grace period

Advanced scenarios

Canary deployments

Forced recreation

Emergency rollback

Monitoring deployments

Watch deployment progress

Check service health after deployment

Verify all containers are healthy

Best practices

Next steps