Skip to main content
Health checks monitor whether your containers are running correctly. Uncloud uses them to detect failures during deployments and automatically remove unhealthy containers from load balancing.

Why use health checks

  • Faster deployments: Containers marked healthy as soon as checks pass, no need to wait full monitoring period
  • Safer rollouts: Detect broken deployments before they affect users
  • Automatic recovery: Unhealthy containers removed from Caddy load balancing
  • Better monitoring: Track container health status with uc ps

How health checks work

A health check is a command that runs inside your container periodically:
  1. Container starts
  2. Health check runs every interval seconds
  3. If command exits 0, container is healthy
  4. If command exits 1, container is unhealthy
  5. After retries consecutive failures, container marked unhealthy

Health states

  • starting: Container just started, health not yet known
  • healthy: Health check passed
  • unhealthy: Health check failed after retries consecutive failures

Configuring health checks

You can configure health checks in your Compose file or Dockerfile.

In Compose file

services:
  web:
    image: myapp:latest
    healthcheck:
      # Command to check health
      test: curl -f http://localhost:8000/health || exit 1
      
      # How often to run the check
      interval: 10s
      
      # How long to wait for check to complete
      timeout: 5s
      
      # How many consecutive failures before unhealthy
      retries: 3
      
      # Wait this long after container starts before first check
      start_period: 30s
      
      # During start_period, check more frequently
      start_interval: 5s

In Dockerfile

FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .

EXPOSE 8000

HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=3 \
  CMD node healthcheck.js || exit 1

CMD ["node", "server.js"]
Compose file settings override Dockerfile settings.

Health check options

test

Command to run inside the container. Three formats: String (shell form):
healthcheck:
  test: curl -f http://localhost:8000/health || exit 1
Array (exec form):
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
Array with shell:
healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"]

interval

Time between health checks (default: 30s):
healthcheck:
  interval: 10s  # Check every 10 seconds

timeout

Maximum time to wait for check to complete (default: 30s):
healthcheck:
  timeout: 5s  # Kill check if it takes longer than 5 seconds

retries

Consecutive failures needed to mark unhealthy (default: 3):
healthcheck:
  retries: 5  # Need 5 failures in a row before unhealthy

start_period

Grace period after container starts (default: 0s):
healthcheck:
  start_period: 30s  # Don't count failures in first 30 seconds
During this period:
  • Failed checks don’t count toward retries
  • First successful check marks container healthy immediately
  • Useful for slow-starting applications

start_interval

Check frequency during start_period (default: 5s):
healthcheck:
  start_period: 60s
  start_interval: 2s  # Check every 2s during startup
  interval: 30s       # Then check every 30s after startup
Use shorter interval during startup to detect healthy state faster.

disable

Disable health check inherited from image:
healthcheck:
  disable: true
Or using test:
healthcheck:
  test: ["NONE"]

Health check commands

HTTP endpoints

Most common approach for web services: Using curl:
healthcheck:
  test: curl -f http://localhost:8000/health || exit 1
  interval: 10s
  timeout: 3s
Using wget:
healthcheck:
  test: wget --no-verbose --tries=1 --spider http://localhost:8000/health || exit 1

TCP port checks

Check if a port is listening:
healthcheck:
  test: nc -z localhost 5432 || exit 1

Custom scripts

Run a custom health check script:
healthcheck:
  test: /app/healthcheck.sh
  interval: 15s
Example script:
#!/bin/sh
set -e

# Check database connection
psql -h localhost -U postgres -c "SELECT 1" > /dev/null

# Check Redis connection
redis-cli ping > /dev/null

# All checks passed
exit 0

Application-specific checks

Run application command:
healthcheck:
  test: ["CMD", "node", "healthcheck.js"]

Health check during deployments

Uncloud monitors health during rolling updates:

Default behavior (no health check)

Without a health check, Uncloud:
  1. Starts new container
  2. Waits 5 seconds (or update_config.monitor period)
  3. Checks container is still running
  4. If running, considers it healthy
  5. Moves to next container

With health check

With a health check, Uncloud:
  1. Starts new container
  2. Container state is starting
  3. Health checks run according to configuration
  4. If healthy before monitoring period ends: Success, move to next container
  5. If still starting after monitoring period: Wait for final health state
  6. If unhealthy after monitoring period: Roll back, stop deployment

Monitoring period

Default is 5 seconds. Change it:
services:
  web:
    image: myapp:latest
    healthcheck:
      test: curl -f http://localhost:8000/health
      interval: 5s
      start_period: 20s
    deploy:
      update_config:
        monitor: 15s  # Wait up to 15s for container to become healthy

Health check examples

Node.js (Express)

Application code:
const express = require('express');
const app = express();

// Health check endpoint
app.get('/health', async (req, res) => {
  try {
    // Check database
    await db.ping();
    
    // Check Redis
    await redis.ping();
    
    // All dependencies healthy
    res.status(200).json({ status: 'healthy' });
  } catch (error) {
    // Something is broken
    res.status(503).json({ 
      status: 'unhealthy',
      error: error.message 
    });
  }
});

app.listen(8000);
Compose file:
services:
  api:
    build: .
    healthcheck:
      test: curl -f http://localhost:8000/health || exit 1
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s

Python (FastAPI)

Application code:
from fastapi import FastAPI, HTTPException
import asyncpg
import aioredis

app = FastAPI()

@app.get("/health")
async def health():
    try:
        # Check PostgreSQL
        await db_pool.fetchval("SELECT 1")
        
        # Check Redis
        await redis.ping()
        
        return {"status": "healthy"}
    except Exception as e:
        raise HTTPException(
            status_code=503,
            detail=f"Unhealthy: {str(e)}"
        )
Compose file:
services:
  api:
    build: .
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 40s

Go

Application code:
package main

import (
    "database/sql"
    "encoding/json"
    "net/http"
)

func healthHandler(db *sql.DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        // Check database connection
        if err := db.Ping(); err != nil {
            w.WriteHeader(http.StatusServiceUnavailable)
            json.NewEncoder(w).Encode(map[string]string{
                "status": "unhealthy",
                "error":  err.Error(),
            })
            return
        }

        w.WriteHeader(http.StatusOK)
        json.NewEncoder(w).Encode(map[string]string{
            "status": "healthy",
        })
    }
}

func main() {
    // ... setup db ...
    http.HandleFunc("/health", healthHandler(db))
    http.ListenAndServe(":8000", nil)
}
Compose file:
services:
  api:
    build: .
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8000/health"]
      interval: 10s
      timeout: 3s
      retries: 3

PostgreSQL

services:
  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

Redis

services:
  cache:
    image: redis:alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 3

Nginx

services:
  web:
    image: nginx:alpine
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 10s
      timeout: 3s
      retries: 3

Health check best practices

Health checks should verify that all critical dependencies are working:
app.get('/health', async (req, res) => {
  const checks = {
    database: await checkDatabase(),
    redis: await checkRedis(),
    s3: await checkS3(),
  };
  
  const healthy = Object.values(checks).every(v => v);
  res.status(healthy ? 200 : 503).json(checks);
});
Health checks should complete quickly (under 1 second):
healthcheck:
  test: curl -f http://localhost/health
  timeout: 3s  # Kill if it takes longer
Avoid:
  • Complex queries
  • External API calls
  • Heavy computations
Set start_period based on how long your app takes to initialize:
# Fast-starting app
healthcheck:
  start_period: 10s

# Slow-starting app (database migrations, etc.)
healthcheck:
  start_period: 60s
Only check dependencies you control:Good:
  • Your database
  • Your Redis cache
  • Your message queue
Bad:
  • Third-party APIs (Stripe, AWS, etc.)
  • External services you don’t control
Include details in health check response:
{
  "status": "healthy",
  "checks": {
    "database": "ok",
    "redis": "ok",
    "disk_space": "ok"
  },
  "timestamp": "2025-03-04T12:34:56Z",
  "version": "1.2.3"
}

Health status in Caddy

Caddy automatically removes unhealthy containers from load balancing:

Automatic removal

When a container becomes unhealthy:
  1. Uncloud detects the health status change
  2. Regenerates Caddy configuration without that container’s IP
  3. Reloads Caddy gracefully
  4. Traffic no longer routed to unhealthy container

Automatic recovery

When an unhealthy container recovers:
  1. Container becomes healthy again
  2. Uncloud adds it back to Caddy configuration
  3. Caddy reloads
  4. Container starts receiving traffic again

View current upstreams

uc caddy config
Shows the current Caddy configuration with only healthy container IPs.

Monitoring container health

Check health status

# List all containers with health status
uc ps
Output shows:
  • Container ID
  • State (running)
  • Health status (healthy/unhealthy/starting)
  • Uptime

Inspect specific service

uc inspect web
Shows detailed health information for all containers in the service.

View health check logs

Docker includes health check output in container logs:
uc logs web
Look for health check related messages.

Troubleshooting

Container always unhealthy

Check health check command:
# Get container ID
uc ps

# Run health check command manually
uc exec <container-id> curl -f http://localhost:8000/health
Common issues:
  • Wrong port number
  • Health endpoint not implemented
  • Application not listening on localhost
  • Dependencies not available

Container stuck in starting state

Check start_period:
healthcheck:
  start_period: 60s  # Increase if app takes long to start
Check start_interval:
healthcheck:
  start_interval: 5s  # Check more frequently during startup

Health checks too aggressive

Increase interval:
healthcheck:
  interval: 30s  # Check less frequently
  retries: 5     # Allow more failures

Container healthy but not receiving traffic

Check Caddy configuration:
uc caddy config
Verify the container IP is listed in upstreams. Check network connectivity:
uc exec <container-id> ping <other-container-ip>

Next steps

Rolling Updates

Use health checks during deployments

Scaling

Scale services with health-aware load balancing

Docker Compose

Configure health checks in Compose files

Deploying Services

Deploy services with health checks

Build docs developers (and LLMs) love