Liveness & Readiness Probes

Overview

Aurora provides dedicated liveness and readiness probe endpoints for Kubernetes and container orchestration platforms. These endpoints enable fine-grained health monitoring and automated recovery strategies.

Liveness Probe

Endpoint

GET /health/liveness

Description

The liveness probe checks if the Flask application process is running and responsive. This is a lightweight check that doesn’t verify external dependencies. Kubernetes uses this to determine if the container should be restarted.

Response

status

string

required

Always returns "alive" if the application is running

Status Codes

200

status code

Application is alive and responsive

Example Response

{
  "status": "alive"
}

Usage

Kubernetes Configuration

apiVersion: v1
kind: Pod
metadata:
  name: aurora-server
spec:
  containers:
  - name: aurora-server
    image: aurora:latest
    livenessProbe:
      httpGet:
        path: /health/liveness
        port: 5080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3

cURL

curl http://localhost:5080/health/liveness

Readiness Probe

Endpoint

GET /health/readiness

Description

The readiness probe checks if the application is ready to accept traffic by verifying critical dependencies (database and Redis). Kubernetes uses this to determine if the pod should receive traffic from the service load balancer.

Response

Ready State

status

string

required

Returns "ready" when critical services are available

Not Ready State

status

string

required

Returns "not_ready" when critical services are unavailable

checks

object

required

Health status of critical services

Show Service Checks

database

object

required

PostgreSQL database health status

Show Properties

status

string

required

Health status: healthy or unhealthy

message

string

Success message when service is healthy

error

string

Error message when service is unhealthy

redis

object

required

Redis cache health status

Show Properties

status

string

required

Health status: healthy or unhealthy

message

string

Success message when service is healthy

error

string

Error message when service is unhealthy

Status Codes

200

status code

Application is ready to accept traffic (both database and Redis are healthy)

503

status code

Application is not ready (database or Redis is unhealthy)

Example Responses

Ready

{
  "status": "ready"
}

Not Ready

{
  "status": "not_ready",
  "checks": {
    "database": {
      "status": "unhealthy",
      "error": "Database connection failed"
    },
    "redis": {
      "status": "healthy",
      "message": "Redis connection successful"
    }
  }
}

Usage

Kubernetes Configuration

apiVersion: v1
kind: Pod
metadata:
  name: aurora-server
spec:
  containers:
  - name: aurora-server
    image: aurora:latest
    readinessProbe:
      httpGet:
        path: /health/readiness
        port: 5080
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 3
      successThreshold: 1
      failureThreshold: 3

cURL

curl http://localhost:5080/health/readiness

JavaScript

const checkReadiness = async () => {
  try {
    const response = await fetch('http://localhost:5080/health/readiness');
    
    if (response.ok) {
      console.log('Service is ready');
      return true;
    } else {
      const data = await response.json();
      console.error('Service not ready:', data.checks);
      return false;
    }
  } catch (error) {
    console.error('Readiness check failed:', error);
    return false;
  }
};

// Poll until ready
const waitForReady = async (maxAttempts = 10, delayMs = 2000) => {
  for (let i = 0; i < maxAttempts; i++) {
    if (await checkReadiness()) {
      return true;
    }
    await new Promise(resolve => setTimeout(resolve, delayMs));
  }
  throw new Error('Service did not become ready in time');
};

Python

import requests
import time

def check_readiness():
    try:
        response = requests.get('http://localhost:5080/health/readiness')
        
        if response.status_code == 200:
            print('Service is ready')
            return True
        else:
            data = response.json()
            print(f"Service not ready: {data.get('checks')}")
            return False
    except requests.RequestException as e:
        print(f"Readiness check failed: {e}")
        return False

def wait_for_ready(max_attempts=10, delay_seconds=2):
    """Poll until service is ready."""
    for i in range(max_attempts):
        if check_readiness():
            return True
        time.sleep(delay_seconds)
    raise RuntimeError('Service did not become ready in time')

Probe Comparison

Aspect	Liveness Probe	Readiness Probe
Purpose	Detect if application is hung/crashed	Detect if application can serve traffic
Checks	Application process only	Database + Redis
Failure Action	Restart container	Remove from load balancer
Response Time	Very fast (~1ms)	Moderate (~50-100ms)
Use When	Container orchestration	Load balancing

Best Practices

Liveness Probe Configuration

Initial Delay: Set initialDelaySeconds to allow application startup (30-60 seconds)
Period: Check frequently (10-30 seconds)
Timeout: Keep short (5 seconds)
Threshold: Allow 2-3 failures before restart to avoid flapping

Readiness Probe Configuration

Initial Delay: Short delay (10-15 seconds) as dependencies should start first
Period: Check frequently (5-10 seconds) for fast recovery
Timeout: Moderate timeout (3-5 seconds)
Threshold: Single success to start receiving traffic, 2-3 failures to stop

Combined Configuration Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aurora-server
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: aurora-server
        image: aurora:latest
        ports:
        - containerPort: 5080
        
        livenessProbe:
          httpGet:
            path: /health/liveness
            port: 5080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        
        readinessProbe:
          httpGet:
            path: /health/readiness
            port: 5080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        
        startupProbe:
          httpGet:
            path: /health/liveness
            port: 5080
          initialDelaySeconds: 0
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 12  # 60 seconds total

Monitoring Recommendations

Metrics to Track

Liveness Failures: Alert on repeated liveness failures indicating application crashes
Readiness Failures: Track dependency health issues (database, Redis)
Recovery Time: Monitor time from not-ready to ready state
Probe Response Time: Detect performance degradation

Alert Examples

# Prometheus AlertManager rules
groups:
- name: aurora-health
  rules:
  - alert: AuroraLivenessFailure
    expr: probe_success{job="aurora-liveness"} == 0
    for: 1m
    annotations:
      summary: "Aurora liveness probe failing"
      description: "Application may be hung or crashed"
  
  - alert: AuroraNotReady
    expr: probe_success{job="aurora-readiness"} == 0
    for: 3m
    annotations:
      summary: "Aurora not ready to serve traffic"
      description: "Check database and Redis connectivity"

Health Status - Comprehensive health check for all services

Authentication

Incidents

Chat Sessions

Connectors

Health & Monitoring

Liveness & Readiness Probes

Overview

Liveness Probe

Endpoint

Description

Response

Status Codes

Example Response

Usage

Kubernetes Configuration

cURL

Readiness Probe

Endpoint

Description

Response

Ready State

Not Ready State

Status Codes

Example Responses

Ready

Not Ready

Usage

Kubernetes Configuration

cURL

JavaScript

Python

Probe Comparison

Best Practices

Liveness Probe Configuration

Readiness Probe Configuration

Combined Configuration Example

Monitoring Recommendations

Metrics to Track

Alert Examples

Build docs developers (and LLMs) love

Authentication

Incidents

Chat Sessions

Connectors

Health & Monitoring

​Overview

​Liveness Probe

​Endpoint

​Description

​Response

​Status Codes

​Example Response

​Usage

​Kubernetes Configuration

​cURL

​Readiness Probe

​Endpoint

​Description

​Response

​Ready State

​Not Ready State

​Status Codes

​Example Responses

​Ready

​Not Ready

​Usage

​Kubernetes Configuration

​cURL

​JavaScript

​Python

​Probe Comparison

​Best Practices

​Liveness Probe Configuration

​Readiness Probe Configuration

​Combined Configuration Example

​Monitoring Recommendations

​Metrics to Track

​Alert Examples

​Related Endpoints

Build docs developers (and LLMs) love

Overview

Liveness Probe

Endpoint

Description

Response

Status Codes

Example Response

Usage

Kubernetes Configuration

cURL

Readiness Probe

Endpoint

Description

Response

Ready State

Not Ready State

Status Codes

Example Responses

Ready

Not Ready

Usage

Kubernetes Configuration

cURL

JavaScript

Python

Probe Comparison

Best Practices

Liveness Probe Configuration

Readiness Probe Configuration

Combined Configuration Example

Monitoring Recommendations

Metrics to Track

Alert Examples

Related Endpoints