Skip to main content
GET
/
health
/
liveness
Liveness & Readiness Probes
curl --request GET \
  --url https://api.example.com/health/liveness
{
  "200": {},
  "503": {},
  "status": "<string>",
  "checks": {
    "database": {
      "status": "<string>",
      "message": "<string>",
      "error": "<string>"
    },
    "redis": {
      "status": "<string>",
      "message": "<string>",
      "error": "<string>"
    }
  }
}

Overview

Aurora provides dedicated liveness and readiness probe endpoints for Kubernetes and container orchestration platforms. These endpoints enable fine-grained health monitoring and automated recovery strategies.

Liveness Probe

Endpoint

GET /health/liveness

Description

The liveness probe checks if the Flask application process is running and responsive. This is a lightweight check that doesn’t verify external dependencies. Kubernetes uses this to determine if the container should be restarted.

Response

status
string
required
Always returns "alive" if the application is running

Status Codes

200
status code
Application is alive and responsive

Example Response

{
  "status": "alive"
}

Usage

Kubernetes Configuration

apiVersion: v1
kind: Pod
metadata:
  name: aurora-server
spec:
  containers:
  - name: aurora-server
    image: aurora:latest
    livenessProbe:
      httpGet:
        path: /health/liveness
        port: 5080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3

cURL

curl http://localhost:5080/health/liveness

Readiness Probe

Endpoint

GET /health/readiness

Description

The readiness probe checks if the application is ready to accept traffic by verifying critical dependencies (database and Redis). Kubernetes uses this to determine if the pod should receive traffic from the service load balancer.

Response

Ready State

status
string
required
Returns "ready" when critical services are available

Not Ready State

status
string
required
Returns "not_ready" when critical services are unavailable
checks
object
required
Health status of critical services

Status Codes

200
status code
Application is ready to accept traffic (both database and Redis are healthy)
503
status code
Application is not ready (database or Redis is unhealthy)

Example Responses

Ready

{
  "status": "ready"
}

Not Ready

{
  "status": "not_ready",
  "checks": {
    "database": {
      "status": "unhealthy",
      "error": "Database connection failed"
    },
    "redis": {
      "status": "healthy",
      "message": "Redis connection successful"
    }
  }
}

Usage

Kubernetes Configuration

apiVersion: v1
kind: Pod
metadata:
  name: aurora-server
spec:
  containers:
  - name: aurora-server
    image: aurora:latest
    readinessProbe:
      httpGet:
        path: /health/readiness
        port: 5080
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 3
      successThreshold: 1
      failureThreshold: 3

cURL

curl http://localhost:5080/health/readiness

JavaScript

const checkReadiness = async () => {
  try {
    const response = await fetch('http://localhost:5080/health/readiness');
    
    if (response.ok) {
      console.log('Service is ready');
      return true;
    } else {
      const data = await response.json();
      console.error('Service not ready:', data.checks);
      return false;
    }
  } catch (error) {
    console.error('Readiness check failed:', error);
    return false;
  }
};

// Poll until ready
const waitForReady = async (maxAttempts = 10, delayMs = 2000) => {
  for (let i = 0; i < maxAttempts; i++) {
    if (await checkReadiness()) {
      return true;
    }
    await new Promise(resolve => setTimeout(resolve, delayMs));
  }
  throw new Error('Service did not become ready in time');
};

Python

import requests
import time

def check_readiness():
    try:
        response = requests.get('http://localhost:5080/health/readiness')
        
        if response.status_code == 200:
            print('Service is ready')
            return True
        else:
            data = response.json()
            print(f"Service not ready: {data.get('checks')}")
            return False
    except requests.RequestException as e:
        print(f"Readiness check failed: {e}")
        return False

def wait_for_ready(max_attempts=10, delay_seconds=2):
    """Poll until service is ready."""
    for i in range(max_attempts):
        if check_readiness():
            return True
        time.sleep(delay_seconds)
    raise RuntimeError('Service did not become ready in time')

Probe Comparison

AspectLiveness ProbeReadiness Probe
PurposeDetect if application is hung/crashedDetect if application can serve traffic
ChecksApplication process onlyDatabase + Redis
Failure ActionRestart containerRemove from load balancer
Response TimeVery fast (~1ms)Moderate (~50-100ms)
Use WhenContainer orchestrationLoad balancing

Best Practices

Liveness Probe Configuration

  • Initial Delay: Set initialDelaySeconds to allow application startup (30-60 seconds)
  • Period: Check frequently (10-30 seconds)
  • Timeout: Keep short (5 seconds)
  • Threshold: Allow 2-3 failures before restart to avoid flapping

Readiness Probe Configuration

  • Initial Delay: Short delay (10-15 seconds) as dependencies should start first
  • Period: Check frequently (5-10 seconds) for fast recovery
  • Timeout: Moderate timeout (3-5 seconds)
  • Threshold: Single success to start receiving traffic, 2-3 failures to stop

Combined Configuration Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aurora-server
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: aurora-server
        image: aurora:latest
        ports:
        - containerPort: 5080
        
        livenessProbe:
          httpGet:
            path: /health/liveness
            port: 5080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        
        readinessProbe:
          httpGet:
            path: /health/readiness
            port: 5080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        
        startupProbe:
          httpGet:
            path: /health/liveness
            port: 5080
          initialDelaySeconds: 0
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 12  # 60 seconds total

Monitoring Recommendations

Metrics to Track

  1. Liveness Failures: Alert on repeated liveness failures indicating application crashes
  2. Readiness Failures: Track dependency health issues (database, Redis)
  3. Recovery Time: Monitor time from not-ready to ready state
  4. Probe Response Time: Detect performance degradation

Alert Examples

# Prometheus AlertManager rules
groups:
- name: aurora-health
  rules:
  - alert: AuroraLivenessFailure
    expr: probe_success{job="aurora-liveness"} == 0
    for: 1m
    annotations:
      summary: "Aurora liveness probe failing"
      description: "Application may be hung or crashed"
  
  - alert: AuroraNotReady
    expr: probe_success{job="aurora-readiness"} == 0
    for: 3m
    annotations:
      summary: "Aurora not ready to serve traffic"
      description: "Check database and Redis connectivity"

Build docs developers (and LLMs) love