Health Checks - Uncloud

Health checks monitor whether your containers are running correctly. Uncloud uses them to detect failures during deployments and automatically remove unhealthy containers from load balancing.

Why use health checks

Faster deployments: Containers marked healthy as soon as checks pass, no need to wait full monitoring period
Safer rollouts: Detect broken deployments before they affect users
Automatic recovery: Unhealthy containers removed from Caddy load balancing
Better monitoring: Track container health status with uc ps

How health checks work

A health check is a command that runs inside your container periodically:

Container starts
Health check runs every interval seconds
If command exits 0, container is healthy
If command exits 1, container is unhealthy
After retries consecutive failures, container marked unhealthy

Health states

starting: Container just started, health not yet known
healthy: Health check passed
unhealthy: Health check failed after retries consecutive failures

Configuring health checks

You can configure health checks in your Compose file or Dockerfile.

In Compose file

services:
  web:
    image: myapp:latest
    healthcheck:
      # Command to check health
      test: curl -f http://localhost:8000/health || exit 1
      
      # How often to run the check
      interval: 10s
      
      # How long to wait for check to complete
      timeout: 5s
      
      # How many consecutive failures before unhealthy
      retries: 3
      
      # Wait this long after container starts before first check
      start_period: 30s
      
      # During start_period, check more frequently
      start_interval: 5s

In Dockerfile

FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .

EXPOSE 8000

HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=3 \
  CMD node healthcheck.js || exit 1

CMD ["node", "server.js"]

Compose file settings override Dockerfile settings.

Health check options

test

Command to run inside the container. Three formats: String (shell form):

healthcheck:
  test: curl -f http://localhost:8000/health || exit 1

Array (exec form):

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]

Array with shell:

healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"]

interval

Time between health checks (default: 30s):

healthcheck:
  interval: 10s  # Check every 10 seconds

timeout

Maximum time to wait for check to complete (default: 30s):

healthcheck:
  timeout: 5s  # Kill check if it takes longer than 5 seconds

retries

Consecutive failures needed to mark unhealthy (default: 3):

healthcheck:
  retries: 5  # Need 5 failures in a row before unhealthy

start_period

Grace period after container starts (default: 0s):

healthcheck:
  start_period: 30s  # Don't count failures in first 30 seconds

During this period:

Failed checks don’t count toward retries
First successful check marks container healthy immediately
Useful for slow-starting applications

start_interval

Check frequency during start_period (default: 5s):

healthcheck:
  start_period: 60s
  start_interval: 2s  # Check every 2s during startup
  interval: 30s       # Then check every 30s after startup

Use shorter interval during startup to detect healthy state faster.

disable

Disable health check inherited from image:

healthcheck:
  disable: true

Or using test:

healthcheck:
  test: ["NONE"]

Health check commands

HTTP endpoints

Most common approach for web services: Using curl:

healthcheck:
  test: curl -f http://localhost:8000/health || exit 1
  interval: 10s
  timeout: 3s

Using wget:

healthcheck:
  test: wget --no-verbose --tries=1 --spider http://localhost:8000/health || exit 1

TCP port checks

Check if a port is listening:

healthcheck:
  test: nc -z localhost 5432 || exit 1

Custom scripts

Run a custom health check script:

healthcheck:
  test: /app/healthcheck.sh
  interval: 15s

Example script:

#!/bin/sh
set -e

# Check database connection
psql -h localhost -U postgres -c "SELECT 1" > /dev/null

# Check Redis connection
redis-cli ping > /dev/null

# All checks passed
exit 0

Application-specific checks

Run application command:

healthcheck:
  test: ["CMD", "node", "healthcheck.js"]

Health check during deployments

Uncloud monitors health during rolling updates:

Default behavior (no health check)

Without a health check, Uncloud:

Starts new container
Waits 5 seconds (or update_config.monitor period)
Checks container is still running
If running, considers it healthy
Moves to next container

With health check

With a health check, Uncloud:

Starts new container
Container state is starting
Health checks run according to configuration
If healthy before monitoring period ends: Success, move to next container
If still starting after monitoring period: Wait for final health state
If unhealthy after monitoring period: Roll back, stop deployment

Monitoring period

Default is 5 seconds. Change it:

services:
  web:
    image: myapp:latest
    healthcheck:
      test: curl -f http://localhost:8000/health
      interval: 5s
      start_period: 20s
    deploy:
      update_config:
        monitor: 15s  # Wait up to 15s for container to become healthy

Health check examples

Node.js (Express)

Application code:

const express = require('express');
const app = express();

// Health check endpoint
app.get('/health', async (req, res) => {
  try {
    // Check database
    await db.ping();
    
    // Check Redis
    await redis.ping();
    
    // All dependencies healthy
    res.status(200).json({ status: 'healthy' });
  } catch (error) {
    // Something is broken
    res.status(503).json({ 
      status: 'unhealthy',
      error: error.message 
    });
  }
});

app.listen(8000);

Compose file:

services:
  api:
    build: .
    healthcheck:
      test: curl -f http://localhost:8000/health || exit 1
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s

Python (FastAPI)

Application code:

from fastapi import FastAPI, HTTPException
import asyncpg
import aioredis

app = FastAPI()

@app.get("/health")
async def health():
    try:
        # Check PostgreSQL
        await db_pool.fetchval("SELECT 1")
        
        # Check Redis
        await redis.ping()
        
        return {"status": "healthy"}
    except Exception as e:
        raise HTTPException(
            status_code=503,
            detail=f"Unhealthy: {str(e)}"
        )

Compose file:

services:
  api:
    build: .
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 40s

Go

Application code:

package main

import (
    "database/sql"
    "encoding/json"
    "net/http"
)

func healthHandler(db *sql.DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        // Check database connection
        if err := db.Ping(); err != nil {
            w.WriteHeader(http.StatusServiceUnavailable)
            json.NewEncoder(w).Encode(map[string]string{
                "status": "unhealthy",
                "error":  err.Error(),
            })
            return
        }

        w.WriteHeader(http.StatusOK)
        json.NewEncoder(w).Encode(map[string]string{
            "status": "healthy",
        })
    }
}

func main() {
    // ... setup db ...
    http.HandleFunc("/health", healthHandler(db))
    http.ListenAndServe(":8000", nil)
}

Compose file:

services:
  api:
    build: .
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8000/health"]
      interval: 10s
      timeout: 3s
      retries: 3

PostgreSQL

services:
  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

Redis

services:
  cache:
    image: redis:alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 3

Nginx

services:
  web:
    image: nginx:alpine
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 10s
      timeout: 3s
      retries: 3

Health check best practices

Check dependencies

Health checks should verify that all critical dependencies are working:

app.get('/health', async (req, res) => {
  const checks = {
    database: await checkDatabase(),
    redis: await checkRedis(),
    s3: await checkS3(),
  };
  
  const healthy = Object.values(checks).every(v => v);
  res.status(healthy ? 200 : 503).json(checks);
});

Keep checks fast

Health checks should complete quickly (under 1 second):

healthcheck:
  test: curl -f http://localhost/health
  timeout: 3s  # Kill if it takes longer

Avoid:

Complex queries
External API calls
Heavy computations

Use appropriate start_period

Set start_period based on how long your app takes to initialize:

# Fast-starting app
healthcheck:
  start_period: 10s

# Slow-starting app (database migrations, etc.)
healthcheck:
  start_period: 60s

Don't check external services

Only check dependencies you control:Good:

Your database
Your Redis cache
Your message queue

Bad:

Third-party APIs (Stripe, AWS, etc.)
External services you don’t control

Return useful information

Include details in health check response:

{
  "status": "healthy",
  "checks": {
    "database": "ok",
    "redis": "ok",
    "disk_space": "ok"
  },
  "timestamp": "2025-03-04T12:34:56Z",
  "version": "1.2.3"
}

Health status in Caddy

Caddy automatically removes unhealthy containers from load balancing:

Automatic removal

When a container becomes unhealthy:

Uncloud detects the health status change
Regenerates Caddy configuration without that container’s IP
Reloads Caddy gracefully
Traffic no longer routed to unhealthy container

Automatic recovery

When an unhealthy container recovers:

Container becomes healthy again
Uncloud adds it back to Caddy configuration
Caddy reloads
Container starts receiving traffic again

View current upstreams

uc caddy config

Shows the current Caddy configuration with only healthy container IPs.

Monitoring container health

Check health status

# List all containers with health status
uc ps

Output shows:

Container ID
State (running)
Health status (healthy/unhealthy/starting)
Uptime

Inspect specific service

uc inspect web

Shows detailed health information for all containers in the service.

View health check logs

Docker includes health check output in container logs:

uc logs web

Look for health check related messages.

Troubleshooting

Container always unhealthy

Check health check command:

# Get container ID
uc ps

# Run health check command manually
uc exec <container-id> curl -f http://localhost:8000/health

Common issues:

Wrong port number
Health endpoint not implemented
Application not listening on localhost
Dependencies not available

Container stuck in starting state

Check start_period:

healthcheck:
  start_period: 60s  # Increase if app takes long to start

Check start_interval:

healthcheck:
  start_interval: 5s  # Check more frequently during startup

Health checks too aggressive

Increase interval:

healthcheck:
  interval: 30s  # Check less frequently
  retries: 5     # Allow more failures

Container healthy but not receiving traffic

Check Caddy configuration:

uc caddy config

Verify the container IP is listed in upstreams. Check network connectivity:

uc exec <container-id> ping <other-container-ip>

Next steps

Rolling Updates

Use health checks during deployments

Scaling

Scale services with health-aware load balancing

Docker Compose

Configure health checks in Compose files

Deploying Services

Deploy services with health checks

Get Started

Core Concepts

Deployment

Operations

Advanced

​Why use health checks

​How health checks work

​Health states

​Configuring health checks

​In Compose file

​In Dockerfile

​Health check options

​test

​interval

​timeout

​retries

​start_period

​start_interval

​disable

​Health check commands

​HTTP endpoints

​TCP port checks

​Custom scripts

​Application-specific checks

​Health check during deployments

​Default behavior (no health check)

​With health check

​Monitoring period

​Health check examples

​Node.js (Express)

​Python (FastAPI)

​Go

​PostgreSQL

​Redis

​Nginx

​Health check best practices

​Health status in Caddy

​Automatic removal

​Automatic recovery

​View current upstreams

​Monitoring container health

​Check health status

​Inspect specific service

​View health check logs

​Troubleshooting

​Container always unhealthy

​Container stuck in starting state

​Health checks too aggressive

​Container healthy but not receiving traffic

​Next steps

Rolling Updates

Scaling

Docker Compose

Deploying Services

Build docs developers (and LLMs) love

Why use health checks

How health checks work

Health states

Configuring health checks

In Compose file

In Dockerfile

Health check options

test

interval

timeout

retries

start_period

start_interval

disable

Health check commands

HTTP endpoints

TCP port checks

Custom scripts

Application-specific checks

Health check during deployments

Default behavior (no health check)

With health check

Monitoring period

Health check examples

Node.js (Express)

Python (FastAPI)

Go

PostgreSQL

Redis

Nginx

Health check best practices

Health status in Caddy

Automatic removal

Automatic recovery

View current upstreams

Monitoring container health

Check health status

Inspect specific service

View health check logs

Troubleshooting

Container always unhealthy

Container stuck in starting state

Health checks too aggressive

Container healthy but not receiving traffic

Next steps