The /api/health endpoint provides system health status, useful for monitoring, load balancers, and orchestration platforms.
Endpoint
Response
Overall system status. Possible values:
healthy - All components operational
degraded - Some components have issues
Status of individual components. GraphStorage is initialized.
QueryEngine is initialized.
QueryParser is initialized.
Neo4j database connection is active.
Examples
curl http://localhost:8000/api/health
Healthy Response
{
"status" : "healthy" ,
"components" : {
"storage" : true ,
"query_engine" : true ,
"query_parser" : true ,
"neo4j" : true
}
}
Degraded Response
{
"status" : "degraded" ,
"components" : {
"storage" : true ,
"query_engine" : true ,
"query_parser" : true ,
"neo4j" : false
}
}
Use Cases
Kubernetes Liveness Probe
Configure Kubernetes to monitor application health:
apiVersion : v1
kind : Pod
metadata :
name : ekg-app
spec :
containers :
- name : app
image : ekg:latest
livenessProbe :
httpGet :
path : /api/health
port : 8000
initialDelaySeconds : 30
periodSeconds : 10
timeoutSeconds : 5
failureThreshold : 3
Docker Healthcheck
Add health check to Dockerfile:
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
CMD curl -f http://localhost:8000/api/health || exit 1
Load Balancer Health Check
Configure load balancer to route traffic only to healthy instances:
upstream ekg_backend {
server ekg-1.internal:8000;
server ekg-2.internal:8000;
server ekg-3.internal:8000;
}
server {
location / {
proxy_pass http://ekg_backend;
# Health check
health_check uri=/api/health interval=10s fails=3 passes=2;
}
}
Monitoring Script
Periodic health monitoring:
import requests
import time
from datetime import datetime
def check_health ():
try :
response = requests.get(
"http://localhost:8000/api/health" ,
timeout = 5
)
health = response.json()
if health[ 'status' ] != 'healthy' :
send_alert(
f "EKG health degraded at { datetime.now() } " ,
health[ 'components' ]
)
return health
except requests.exceptions.RequestException as e:
send_alert( f "EKG health check failed: { e } " )
return None
while True :
health = check_health()
if health:
print ( f "[ { datetime.now() } ] Status: { health[ 'status' ] } " )
time.sleep( 60 ) # Check every minute
Prometheus Integration
Export health metrics to Prometheus:
from prometheus_client import Gauge, generate_latest
from fastapi import Response
# Define metrics
health_status = Gauge( 'ekg_health_status' , 'Overall health status (1=healthy, 0=degraded)' )
component_status = Gauge( 'ekg_component_status' , 'Component status' , [ 'component' ])
@app.get ( "/metrics" )
async def metrics ():
# Update metrics from health check
health = await health_check()
health_status.set( 1 if health[ 'status' ] == 'healthy' else 0 )
for component, status in health[ 'components' ].items():
component_status.labels( component = component).set( 1 if status else 0 )
return Response( content = generate_latest(), media_type = "text/plain" )
Implementation
From chat/app.py:196-219:
@app.get ( "/api/health" )
async def health_check ():
"""Health check endpoint."""
global storage, query_engine, query_parser
status = {
"status" : "healthy" ,
"components" : {
"storage" : storage is not None ,
"query_engine" : query_engine is not None ,
"query_parser" : query_parser is not None
}
}
# Test Neo4j connection
try :
if storage:
storage.execute_cypher( "RETURN 1" )
status[ "components" ][ "neo4j" ] = True
except Exception :
status[ "components" ][ "neo4j" ] = False
status[ "status" ] = "degraded"
return status
Component Checks
Storage Check
Verifies GraphStorage object is initialized:
"storage" : storage is not None
Query Engine Check
Verifies QueryEngine object is initialized:
"query_engine" : query_engine is not None
Query Parser Check
Verifies QueryParser object is initialized:
"query_parser" : query_parser is not None
Neo4j Check
Executes a test query against Neo4j:
try :
storage.execute_cypher( "RETURN 1" )
status[ "components" ][ "neo4j" ] = True
except Exception :
status[ "components" ][ "neo4j" ] = False
status[ "status" ] = "degraded"
Status Interpretation
System is fully operational. All queries should work.
Database connection lost. Queries will fail. Check:
Neo4j container is running
Network connectivity
NEO4J_URI configuration
Natural language queries won’t work. Check:
GEMINI_API_KEY is valid
Internet connectivity for Gemini API
Application startup logs
Storage or query_engine false
Core components failed to initialize. Check:
Application startup logs
Neo4j connectivity
Configuration files
Response Times
Expected response times:
Healthy : < 100ms
Neo4j slow : 500ms - 5s
Timeout : > 5s (connection issues)
Set health check timeouts to at least 5 seconds to avoid false positives during Neo4j slowness.
Best Practices
Monitor continuously
Check health at regular intervals (30-60 seconds).
Alert on degraded
Trigger alerts when status becomes degraded: if health[ 'status' ] == 'degraded' :
send_alert(health[ 'components' ])
Correlate with metrics
Compare health status with:
Request latency
Error rates
Neo4j query times
Graceful degradation
Handle degraded state gracefully:
Return cached results
Show user-friendly error messages
Retry with exponential backoff
Monitoring Guide Complete monitoring setup
Troubleshooting Fix common issues