Overview
Each service in the distributed notification system exposes a /health endpoint for monitoring service availability and dependencies. Health checks are essential for:
Service orchestration and load balancing
Automatic recovery and restarts
Deployment verification
Dependency monitoring
Health Check Endpoints
All services expose health checks on the /health path:
Service Port Health Endpoint API Gateway 8000 http://localhost:8000/healthUser Service 8001 http://localhost:8001/healthTemplate Service 8002 http://localhost:8002/healthEmail Service 8003 http://localhost:8003/healthPush Service 8004 http://localhost:8004/health
Service Health Implementations
API Gateway Health Check
The API Gateway uses NestJS Terminus to check RabbitMQ connectivity:
api-gateway/src/health/health.controller.ts
import { Controller , Get } from '@nestjs/common' ;
import { HealthCheckService , HealthCheck , MicroserviceHealthIndicator } from '@nestjs/terminus' ;
import { Transport } from '@nestjs/microservices' ;
@ Controller ( 'health' )
export class HealthController {
constructor (
private health : HealthCheckService ,
private microservice : MicroserviceHealthIndicator ,
) {}
@ Get ()
@ HealthCheck ()
check () {
return this . health . check ([
() => this . microservice . pingCheck ( 'rabbitmq' , {
transport: Transport . RMQ ,
options: {
urls: [ process . env . RABBITMQ_URL || 'amqp://guest:guest@localhost:5672' ],
},
}),
]);
}
}
Health Module:
api-gateway/src/health/health.module.ts
import { Module } from '@nestjs/common' ;
import { TerminusModule } from '@nestjs/terminus' ;
import { HealthController } from './health.controller' ;
@ Module ({
imports: [ TerminusModule ],
controllers: [ HealthController ]
})
export class HealthModule {}
Response (Healthy):
{
"status" : "ok" ,
"info" : {
"rabbitmq" : {
"status" : "up"
}
},
"error" : {},
"details" : {
"rabbitmq" : {
"status" : "up"
}
}
}
Response (Unhealthy):
{
"status" : "error" ,
"info" : {},
"error" : {
"rabbitmq" : {
"status" : "down" ,
"message" : "Connection refused"
}
},
"details" : {
"rabbitmq" : {
"status" : "down" ,
"message" : "Connection refused"
}
}
}
Push Service Health Check
The Push Service checks connectivity to its dependencies:
push-service/src/health/health.controller.ts
import { Controller , Get } from '@nestjs/common' ;
import { HealthCheckService , HealthCheck , HttpHealthIndicator } from '@nestjs/terminus' ;
@ Controller ( 'health' )
export class HealthController {
constructor (
private health : HealthCheckService ,
private http : HttpHealthIndicator ,
) {}
@ Get ()
@ HealthCheck ()
async checkHealth () {
return this . health . check ([
async () => this . http . pingCheck ( 'api-gateway' , process . env . API_GATEWAY_URL || 'http://localhost:3000' ),
async () => this . http . pingCheck ( 'user-service' , process . env . USER_SERVICE_URL || 'http://localhost:8081' ),
async () => this . http . pingCheck ( 'template-service' , process . env . TEMPLATE_SERVICE_URL || 'http://localhost:8082' ),
]);
}
}
The Push Service validates that all upstream dependencies are reachable before marking itself as healthy.
Email Service Health Check
The Email Service uses a lightweight HTTP listener for health checks:
email-service/EmailService/Program.cs
using System . Net ;
using System . Text ;
// Simple Health Endpoint (No ASP.NET Core Required)
_ = Task . Run ( async () =>
{
var port = Environment . GetEnvironmentVariable ( "PORT" ) ?? "8080" ;
var listener = new HttpListener ();
listener . Prefixes . Add ( $"http://+: { port } /health/" );
listener . Start ();
Log . Information ( $"Health endpoint listening on http://+: { port } /health" );
while ( true )
{
var context = await listener . GetContextAsync ();
var response = context . Response ;
var buffer = Encoding . UTF8 . GetBytes ( "{ \" status \" : \" healthy \" }" );
response . ContentType = "application/json" ;
response . ContentLength64 = buffer . Length ;
await response . OutputStream . WriteAsync ( buffer );
response . Close ();
}
});
Response:
The Email Service uses a background task to expose a health endpoint without running a full ASP.NET Core web server.
Docker Health Check Configuration
RabbitMQ Health Check
rabbitmq :
image : rabbitmq:3.11-management
healthcheck :
test : [ "CMD" , "rabbitmq-diagnostics" , "ping" ]
interval : 5s
timeout : 10s
retries : 5
start_period : 10s
Parameters:
test: Command to check if RabbitMQ is responsive
interval: Check every 5 seconds
timeout: Fail if check takes longer than 10 seconds
retries: Mark unhealthy after 5 consecutive failures
start_period: Grace period before starting health checks
Service Dependency Management
Services use Docker Compose depends_on with health conditions:
email-service :
depends_on :
rabbitmq :
condition : service_healthy
redis :
condition : service_started
mailhog :
condition : service_started
The Email Service won’t start until RabbitMQ passes its health check, preventing startup errors.
Service Dependency Checks
Dependency Graph
API Gateway
├── RabbitMQ (required)
├── User Service (required)
└── Template Service (required)
Email Service
├── RabbitMQ (required)
├── Redis (required)
├── User Service (optional)
└── Template Service (optional)
Push Service
├── RabbitMQ (required)
├── User Service (required)
└── Template Service (required)
User Service
├── PostgreSQL (required)
└── Redis (required)
Template Service
└── PostgreSQL (required)
Checking Dependencies Manually
Check RabbitMQ:
curl http://localhost:15673/api/healthchecks/node
Check Redis:
redis-cli -h localhost -p 6379 ping
# Expected: PONG
Check PostgreSQL:
psql -h localhost -p 5432 -U postgres -c "SELECT 1;"
Using Health Endpoints for Orchestration
Kubernetes Liveness and Readiness Probes
apiVersion : v1
kind : Pod
metadata :
name : api-gateway
spec :
containers :
- name : api-gateway
image : api-gateway:latest
livenessProbe :
httpGet :
path : /health
port : 8000
initialDelaySeconds : 30
periodSeconds : 10
readinessProbe :
httpGet :
path : /health
port : 8000
initialDelaySeconds : 5
periodSeconds : 5
Liveness Probe: Restarts the container if the service is unresponsive
Readiness Probe: Removes the pod from load balancing if not ready
Load Balancer Health Checks
Configure your load balancer to use health endpoints:
AWS Application Load Balancer:
Health check path: /health
Health check interval: 30 seconds
Healthy threshold: 2 consecutive successes
Unhealthy threshold: 2 consecutive failures
Timeout: 5 seconds
CI/CD Deployment Verification
Verify service health after deployment:
#!/bin/bash
SERVICES = ( "8000" "8001" "8002" "8003" "8004" )
for port in "${ SERVICES [ @ ]}" ; do
echo "Checking service on port $port ..."
response = $( curl -s -o /dev/null -w "%{http_code}" http://localhost: $port /health )
if [ " $response " = "200" ]; then
echo "✓ Port $port : Healthy"
else
echo "✗ Port $port : Unhealthy (HTTP $response )"
exit 1
fi
done
echo "All services healthy!"
Health Check Best Practices
Keep Checks Lightweight : Health checks should complete quickly (< 1 second)
Check Critical Dependencies : Only verify essential services, not all dependencies
Use Appropriate Timeouts : Allow enough time for transient network issues
Implement Graceful Degradation : Return partial availability status when possible
Log Health Check Failures : Track why services are marked unhealthy
Avoid External Calls : Don’t rely on third-party APIs in health checks
Version Your Endpoints : Include service version in health responses for debugging
Advanced Health Check Patterns
Detailed Health Response
{
"status" : "ok" ,
"version" : "1.0.0" ,
"uptime" : 3600 ,
"timestamp" : "2026-03-03T12:34:56Z" ,
"checks" : {
"database" : {
"status" : "up" ,
"response_time_ms" : 5
},
"rabbitmq" : {
"status" : "up" ,
"queue_depth" : 42
},
"redis" : {
"status" : "up" ,
"memory_mb" : 128
}
}
}
Startup Probes for Slow-Starting Services
startupProbe :
httpGet :
path : /health
port : 8000
initialDelaySeconds : 0
periodSeconds : 10
failureThreshold : 30 # Allow up to 5 minutes to start
Next Steps
Monitoring Set up monitoring and observability for the system
Troubleshooting Diagnose and resolve common system issues