Overview
Health checks are implemented across all API services to monitor the health of the application and its dependencies. The system uses ASP.NET Core Health Checks with custom UI responses for detailed diagnostics.Health Check Endpoints
All services expose a health check endpoint at:HealthChecks.UI.Client.UIResponseWriter)
Example Response
Healthy: All checks passedDegraded: Some checks are in a degraded stateUnhealthy: One or more checks failed
Service Health Check Configuration
Catalog API
File:Services/Catalog/Catalog.API/Program.cs:28-29
- PostgreSQL Database: Verifies connection to CatalogDb
- NuGet Package:
AspNetCore.HealthChecks.NpgSql
Basket API
File:Services/Basket/Basket.API/Program.cs:58-60
- PostgreSQL Database: Verifies connection to BasketDb
- Redis Cache: Verifies connection to distributed cache
- NuGet Package:
AspNetCore.HealthChecks.NpgSql - NuGet Package:
AspNetCore.HealthChecks.Redis
Ordering API
File:Services/Ordering/Ordering.API/DependencyInjection.cs:15-16
- SQL Server Database: Verifies connection to OrderDb
- NuGet Package:
AspNetCore.HealthChecks.SqlServer
Discount gRPC
Status: No health checks currently implemented Note: Consider adding SQLite health checks for production deployments:YARP API Gateway
Status: No health checks currently implemented Recommendation: Add downstream service health checks:Shopping Web
Status: No health checks currently implemented Recommendation: Add gateway health check:Required NuGet Packages
Install the appropriate health check packages for each service:Testing Health Checks
Using curl
Using Docker
Expected Healthy Response
Example Unhealthy Response
Docker Health Checks
Add health check configuration todocker-compose.override.yml:
Example Configuration
Health Check Options
| Option | Description | Default |
|---|---|---|
test | Command to run | None |
interval | Time between checks | 30s |
timeout | Max time for check | 30s |
retries | Consecutive failures before unhealthy | 3 |
start_period | Initial grace period | 0s |
Kubernetes Liveness and Readiness Probes
Deployment Example
Probe Types
Liveness Probe:- Determines if the container should be restarted
- Unhealthy containers are killed and restarted
- Determines if the container is ready to accept traffic
- Unhealthy containers are removed from service load balancing
Advanced Health Check Configuration
Custom Health Checks
Create custom health checks for business logic:Health Check Tags
Organize health checks with tags:Separate Endpoints
Monitoring and Alerting
Health Checks UI
Install the Health Checks UI for dashboard monitoring:http://localhost:6004/health-ui
Integration with Monitoring Tools
Prometheus Metrics:Troubleshooting
Health Check Returns 503
Health Check Returns 503
Cause: One or more dependency checks are failing.Solution:
- Check the JSON response for specific failed checks
- Verify database connection strings
- Ensure dependent services are running
- Check network connectivity between services
Health Check Timeout
Health Check Timeout
Cause: Health check is taking too long to respond.Solution:
- Increase timeout in Docker/Kubernetes configuration
- Optimize database queries used in health checks
- Check for network latency issues
Container Keeps Restarting
Container Keeps Restarting
Cause: Liveness probe is failing repeatedly.Solution:
- Increase
initialDelaySecondsto allow app startup time - Increase
failureThresholdto tolerate transient failures - Check application logs for startup errors
Service Not Receiving Traffic
Service Not Receiving Traffic
Cause: Readiness probe is failing.Solution:
- Verify all dependencies are available
- Check that migrations have completed
- Ensure cache/message broker connections are established
Best Practices
Keep Checks Lightweight
Health checks should complete quickly (< 1 second). Avoid expensive operations.
Check All Dependencies
Include checks for databases, caches, message brokers, and downstream services.
Use Separate Endpoints
Implement distinct
/health/live, /health/ready, and /health/startup endpoints.Include Detailed Responses
Use UIResponseWriter for detailed JSON responses with timing and error information.
