Overview
This guide covers common issues you may encounter when running the distributed notification system and their solutions. Use correlation IDs to trace issues across services.Common Issues and Solutions
Service won't start - Port already in use
Service won't start - Port already in use
Symptom:Cause: Another process is using the required port.Solution:
- Find the process using the port:
- Kill the process:
- Or change the port in your environment variables:
- Restart the service:
Service container exits immediately
Service container exits immediately
Symptom:Cause: Application crash, missing environment variables, or dependency not ready.Solution:
- Check container logs:
- Look for error messages about:
- Missing environment variables
- Database connection failures
- RabbitMQ connection errors
- Verify environment file exists:
- Check dependency health:
- Restart with dependencies:
RabbitMQ connection refused
RabbitMQ connection refused
Symptom:Cause: RabbitMQ is not running or not ready.Solution:
- Check RabbitMQ status:
- Check RabbitMQ health:
- Restart RabbitMQ:
- Wait for RabbitMQ to be healthy:
- Check connection URL in environment:
Messages stuck in queue
Messages stuck in queue
Symptom: RabbitMQ queue depth keeps increasing, messages not being processed.Cause: Consumer service down, processing errors, or message format issues.Solution:
- Check consumer service status:
- Inspect RabbitMQ Management UI:
- Open
http://localhost:15673 - Check queue “email.queue” or “push.queue”
- View consumer count (should be > 0)
- Open
- Check service logs for errors:
- Inspect a message in the queue:
- In RabbitMQ UI: Queues → email.queue → Get Messages
- Verify message format matches expected schema
- Restart consumer service:
- If messages are malformed, purge the queue:
Database connection failed
Database connection failed
Symptom:Cause: PostgreSQL not running, wrong credentials, or network issue.Solution:Expected values:
- Check PostgreSQL status:
- Test database connection:
- Verify environment variables:
DB_HOST=postgres(not localhost)DB_PORT=5432DB_USERNAME=postgresDB_PASSWORD=<your_password>DB_DATABASE=notification_db
- Check database exists:
- Recreate database if needed:
Redis connection timeout
Redis connection timeout
Symptom:Cause: Redis not running or network issue.Solution:Expected:
- Check Redis status:
- Test Redis connection:
- Verify Redis configuration:
REDIS_HOST=redis(not localhost)REDIS_PORT=6379
- Restart Redis:
- Clear Redis data if corrupted:
Email not sending
Email not sending
Symptom: No emails appear in MailHog, or SMTP errors in logs.Cause: SMTP configuration issue, MailHog not running, or message format error.Solution:Expected:
- Check MailHog status:
- Open MailHog UI:
- Navigate to
http://localhost:8025 - Check if any emails are captured
- Navigate to
- Check Email Service logs:
- Verify SMTP configuration:
SMTP_HOST=mailhogSMTP_PORT=1025[email protected]
- Test SMTP connection:
- Check if message reached email.queue:
- Open RabbitMQ UI:
http://localhost:15673 - Check “email.queue” message count
- Open RabbitMQ UI:
Push notification not delivered
Push notification not delivered
Symptom: Push notifications not sent, or invalid token errors.Cause: Invalid device token, FCM/provider configuration issue, or user preferences.Solution:
- Check Push Service logs:
- Verify user has push token:
- Check user preferences:
- Verify FCM credentials (production):
- Check
FCM_SERVER_KEYorFCM_SERVICE_ACCOUNTenvironment variable - Ensure Firebase project is properly configured
- Check
- Check message in push.queue:
- Open RabbitMQ UI:
http://localhost:15673 - Inspect “push.queue” for messages
- Open RabbitMQ UI:
- Look for errors in failed.queue:
- Check if messages moved to dead letter queue
- Inspect error details
API returns 500 Internal Server Error
API returns 500 Internal Server Error
Symptom:Cause: Unhandled exception in service code.Solution:
- Check service logs with correlation ID:
- Enable debug logging:
- Check for dependency failures:
- Verify request payload format:
High memory usage
High memory usage
Symptom: Service consuming excessive memory, or Docker killing containers.Cause: Memory leak, large queue backlog, or insufficient resources.Solution:
- Check container memory usage:
- Identify the problematic service:
- Check queue depths:
- Large queues consume memory in RabbitMQ
- Open
http://localhost:15673and check queue sizes
- Restart the service:
- Increase memory limits:
docker-compose.yml
- Scale consumers if queue is large:
Debugging with Correlation IDs
Correlation IDs allow you to trace a notification request across all services.Tracing a Request
- Get the correlation ID from the response header:
- Search logs across all services:
- Search all logs at once:
Example Correlation Trace
Failed Message Queue Inspection
Messages that fail after all retry attempts are moved to thefailed.queue for inspection.
Accessing Failed Messages
- Open RabbitMQ Management UI:
-
Navigate to the failed queue:
- Click “Queues” tab
- Click “failed.queue”
-
Get messages from the queue:
- Scroll to “Get messages” section
- Set “Messages” to 10
- Click “Get Message(s)“
Analyzing Failed Messages
Each failed message includes:- Original message payload
- Error details (in headers)
- Retry count
- Timestamp
Reprocessing Failed Messages
Option 1: Manually requeue via RabbitMQ UI- In RabbitMQ UI, go to failed.queue
- Click “Move messages”
- Select destination queue (e.g., email.queue)
- Click “Move messages”
Service-Specific Debugging
API Gateway
Email Service (C#)
Database Services
Getting Additional Help
Health Checks
Verify service health and dependencies
Monitoring
Set up comprehensive system monitoring