Skip to main content
The /api/health endpoint provides system health status, useful for monitoring, load balancers, and orchestration platforms.

Endpoint

GET /api/health

Response

status
string
Overall system status. Possible values:
  • healthy - All components operational
  • degraded - Some components have issues
components
object
Status of individual components.

Examples

curl http://localhost:8000/api/health

Healthy Response

{
  "status": "healthy",
  "components": {
    "storage": true,
    "query_engine": true,
    "query_parser": true,
    "neo4j": true
  }
}

Degraded Response

{
  "status": "degraded",
  "components": {
    "storage": true,
    "query_engine": true,
    "query_parser": true,
    "neo4j": false
  }
}

Use Cases

Kubernetes Liveness Probe

Configure Kubernetes to monitor application health:
apiVersion: v1
kind: Pod
metadata:
  name: ekg-app
spec:
  containers:
  - name: app
    image: ekg:latest
    livenessProbe:
      httpGet:
        path: /api/health
        port: 8000
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3

Docker Healthcheck

Add health check to Dockerfile:
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
  CMD curl -f http://localhost:8000/api/health || exit 1

Load Balancer Health Check

Configure load balancer to route traffic only to healthy instances:
upstream ekg_backend {
  server ekg-1.internal:8000;
  server ekg-2.internal:8000;
  server ekg-3.internal:8000;
}

server {
  location / {
    proxy_pass http://ekg_backend;
    
    # Health check
    health_check uri=/api/health interval=10s fails=3 passes=2;
  }
}

Monitoring Script

Periodic health monitoring:
import requests
import time
from datetime import datetime

def check_health():
    try:
        response = requests.get(
            "http://localhost:8000/api/health",
            timeout=5
        )
        health = response.json()
        
        if health['status'] != 'healthy':
            send_alert(
                f"EKG health degraded at {datetime.now()}",
                health['components']
            )
        
        return health
        
    except requests.exceptions.RequestException as e:
        send_alert(f"EKG health check failed: {e}")
        return None

while True:
    health = check_health()
    if health:
        print(f"[{datetime.now()}] Status: {health['status']}")
    time.sleep(60)  # Check every minute

Prometheus Integration

Export health metrics to Prometheus:
from prometheus_client import Gauge, generate_latest
from fastapi import Response

# Define metrics
health_status = Gauge('ekg_health_status', 'Overall health status (1=healthy, 0=degraded)')
component_status = Gauge('ekg_component_status', 'Component status', ['component'])

@app.get("/metrics")
async def metrics():
    # Update metrics from health check
    health = await health_check()
    
    health_status.set(1 if health['status'] == 'healthy' else 0)
    
    for component, status in health['components'].items():
        component_status.labels(component=component).set(1 if status else 0)
    
    return Response(content=generate_latest(), media_type="text/plain")

Implementation

From chat/app.py:196-219:
@app.get("/api/health")
async def health_check():
    """Health check endpoint."""
    global storage, query_engine, query_parser
    
    status = {
        "status": "healthy",
        "components": {
            "storage": storage is not None,
            "query_engine": query_engine is not None,
            "query_parser": query_parser is not None
        }
    }
    
    # Test Neo4j connection
    try:
        if storage:
            storage.execute_cypher("RETURN 1")
            status["components"]["neo4j"] = True
    except Exception:
        status["components"]["neo4j"] = False
        status["status"] = "degraded"
    
    return status

Component Checks

Storage Check

Verifies GraphStorage object is initialized:
"storage": storage is not None

Query Engine Check

Verifies QueryEngine object is initialized:
"query_engine": query_engine is not None

Query Parser Check

Verifies QueryParser object is initialized:
"query_parser": query_parser is not None

Neo4j Check

Executes a test query against Neo4j:
try:
    storage.execute_cypher("RETURN 1")
    status["components"]["neo4j"] = True
except Exception:
    status["components"]["neo4j"] = False
    status["status"] = "degraded"

Status Interpretation

System is fully operational. All queries should work.
Database connection lost. Queries will fail. Check:
  • Neo4j container is running
  • Network connectivity
  • NEO4J_URI configuration
Natural language queries won’t work. Check:
  • GEMINI_API_KEY is valid
  • Internet connectivity for Gemini API
  • Application startup logs
Core components failed to initialize. Check:
  • Application startup logs
  • Neo4j connectivity
  • Configuration files

Response Times

Expected response times:
  • Healthy: < 100ms
  • Neo4j slow: 500ms - 5s
  • Timeout: > 5s (connection issues)
Set health check timeouts to at least 5 seconds to avoid false positives during Neo4j slowness.

Best Practices

1

Monitor continuously

Check health at regular intervals (30-60 seconds).
2

Alert on degraded

Trigger alerts when status becomes degraded:
if health['status'] == 'degraded':
    send_alert(health['components'])
3

Correlate with metrics

Compare health status with:
  • Request latency
  • Error rates
  • Neo4j query times
4

Graceful degradation

Handle degraded state gracefully:
  • Return cached results
  • Show user-friendly error messages
  • Retry with exponential backoff

Monitoring Guide

Complete monitoring setup

Troubleshooting

Fix common issues

Build docs developers (and LLMs) love