Skip to main content

Overview

This guide covers common issues you may encounter when running the distributed notification system and their solutions. Use correlation IDs to trace issues across services.

Common Issues and Solutions

Symptom:
Error: listen EADDRINUSE: address already in use :::8000
Cause: Another process is using the required port.Solution:
  1. Find the process using the port:
lsof -i :8000
# or on Linux
netstat -tulpn | grep 8000
  1. Kill the process:
kill -9 <PID>
  1. Or change the port in your environment variables:
export PORT=8080
  1. Restart the service:
docker-compose restart api-gateway
Symptom:
docker-compose ps
# Shows service as "Exit 1" or "Exit 137"
Cause: Application crash, missing environment variables, or dependency not ready.Solution:
  1. Check container logs:
docker-compose logs api-gateway
  1. Look for error messages about:
    • Missing environment variables
    • Database connection failures
    • RabbitMQ connection errors
  2. Verify environment file exists:
ls -la api-gateway/.env
  1. Check dependency health:
docker-compose ps
# Ensure rabbitmq, postgres, redis are "Up" and "healthy"
  1. Restart with dependencies:
docker-compose down
docker-compose up -d rabbitmq postgres redis
# Wait 30 seconds for services to be healthy
docker-compose up -d
Symptom:
Error: connect ECONNREFUSED 127.0.0.1:5672
[error] Failed to connect to RabbitMQ
Cause: RabbitMQ is not running or not ready.Solution:
  1. Check RabbitMQ status:
docker-compose ps rabbitmq
  1. Check RabbitMQ health:
curl http://localhost:15673/api/healthchecks/node
  1. Restart RabbitMQ:
docker-compose restart rabbitmq
  1. Wait for RabbitMQ to be healthy:
docker-compose logs -f rabbitmq
# Wait for "Server startup complete"
  1. Check connection URL in environment:
# Should be: amqp://rabbitmq:5672 (inside Docker)
# NOT: amqp://localhost:5672
docker-compose exec api-gateway env | grep RABBITMQ
Symptom: RabbitMQ queue depth keeps increasing, messages not being processed.Cause: Consumer service down, processing errors, or message format issues.Solution:
  1. Check consumer service status:
docker-compose ps email-service push-service
  1. Inspect RabbitMQ Management UI:
    • Open http://localhost:15673
    • Check queue “email.queue” or “push.queue”
    • View consumer count (should be > 0)
  2. Check service logs for errors:
docker-compose logs -f email-service
  1. Inspect a message in the queue:
    • In RabbitMQ UI: Queues → email.queue → Get Messages
    • Verify message format matches expected schema
  2. Restart consumer service:
docker-compose restart email-service
  1. If messages are malformed, purge the queue:
# WARNING: This deletes all messages!
docker-compose exec rabbitmq rabbitmqctl purge_queue email.queue
Symptom:
Error: connect ECONNREFUSED 127.0.0.1:5432
[error] Unable to connect to the database
Cause: PostgreSQL not running, wrong credentials, or network issue.Solution:
  1. Check PostgreSQL status:
docker-compose ps postgres
  1. Test database connection:
docker-compose exec postgres psql -U postgres -c "SELECT 1;"
  1. Verify environment variables:
docker-compose exec user-service env | grep DB_
Expected values:
  • DB_HOST=postgres (not localhost)
  • DB_PORT=5432
  • DB_USERNAME=postgres
  • DB_PASSWORD=<your_password>
  • DB_DATABASE=notification_db
  1. Check database exists:
docker-compose exec postgres psql -U postgres -c "\l"
  1. Recreate database if needed:
docker-compose exec postgres psql -U postgres -c "CREATE DATABASE notification_db;"
Symptom:
[error] Redis connection timeout
Error: connect ETIMEDOUT
Cause: Redis not running or network issue.Solution:
  1. Check Redis status:
docker-compose ps redis
  1. Test Redis connection:
docker-compose exec redis redis-cli ping
# Expected: PONG
  1. Verify Redis configuration:
docker-compose exec api-gateway env | grep REDIS
Expected:
  • REDIS_HOST=redis (not localhost)
  • REDIS_PORT=6379
  1. Restart Redis:
docker-compose restart redis
  1. Clear Redis data if corrupted:
docker-compose exec redis redis-cli FLUSHALL
Symptom: No emails appear in MailHog, or SMTP errors in logs.Cause: SMTP configuration issue, MailHog not running, or message format error.Solution:
  1. Check MailHog status:
docker-compose ps mailhog
  1. Open MailHog UI:
    • Navigate to http://localhost:8025
    • Check if any emails are captured
  2. Check Email Service logs:
docker-compose logs -f email-service
  1. Verify SMTP configuration:
docker-compose exec email-service env | grep SMTP
Expected:
  1. Test SMTP connection:
docker-compose exec email-service nc -zv mailhog 1025
# Expected: Connection succeeded
  1. Check if message reached email.queue:
    • Open RabbitMQ UI: http://localhost:15673
    • Check “email.queue” message count
Symptom: Push notifications not sent, or invalid token errors.Cause: Invalid device token, FCM/provider configuration issue, or user preferences.Solution:
  1. Check Push Service logs:
docker-compose logs -f push-service
  1. Verify user has push token:
curl http://localhost:8001/api/v1/users/{user_id}
# Check "push_token" field is not null
  1. Check user preferences:
# User must have "push": true in preferences
curl http://localhost:8001/api/v1/users/{user_id}
  1. Verify FCM credentials (production):
    • Check FCM_SERVER_KEY or FCM_SERVICE_ACCOUNT environment variable
    • Ensure Firebase project is properly configured
  2. Check message in push.queue:
    • Open RabbitMQ UI: http://localhost:15673
    • Inspect “push.queue” for messages
  3. Look for errors in failed.queue:
    • Check if messages moved to dead letter queue
    • Inspect error details
Symptom:
{
  "success": false,
  "error": "Internal server error",
  "message": "An unexpected error occurred"
}
Cause: Unhandled exception in service code.Solution:
  1. Check service logs with correlation ID:
# Get correlation ID from response header: X-Correlation-Id
docker-compose logs api-gateway | grep <correlation-id>
  1. Enable debug logging:
# Set LOG_LEVEL=debug in service .env file
docker-compose restart api-gateway
  1. Check for dependency failures:
curl http://localhost:8000/health
  1. Verify request payload format:
# Ensure all required fields are present
curl -X POST http://localhost:8000/api/v1/notifications \
  -H "Content-Type: application/json" \
  -d '{
    "notification_type": "email",
    "user_id": "uuid-here",
    "template_code": "welcome",
    "variables": {},
    "request_id": "unique-id"
  }'
Symptom: Service consuming excessive memory, or Docker killing containers.Cause: Memory leak, large queue backlog, or insufficient resources.Solution:
  1. Check container memory usage:
docker stats
  1. Identify the problematic service:
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}"
  1. Check queue depths:
    • Large queues consume memory in RabbitMQ
    • Open http://localhost:15673 and check queue sizes
  2. Restart the service:
docker-compose restart <service-name>
  1. Increase memory limits:
docker-compose.yml
services:
  api-gateway:
    deploy:
      resources:
        limits:
          memory: 1G
  1. Scale consumers if queue is large:
docker-compose up -d --scale email-service=3

Debugging with Correlation IDs

Correlation IDs allow you to trace a notification request across all services.

Tracing a Request

  1. Get the correlation ID from the response header:
curl -v http://localhost:8000/api/v1/notifications -d '{...}'
# Look for: X-Correlation-Id: abc123-def456-...
  1. Search logs across all services:
CORRELATION_ID="abc123-def456"

# API Gateway
docker-compose logs api-gateway | grep $CORRELATION_ID

# User Service
docker-compose logs user-service | grep $CORRELATION_ID

# Template Service
docker-compose logs template-service | grep $CORRELATION_ID

# Email Service
docker-compose logs email-service | grep $CORRELATION_ID

# Push Service
docker-compose logs push-service | grep $CORRELATION_ID
  1. Search all logs at once:
docker-compose logs -f | grep $CORRELATION_ID

Example Correlation Trace

[12:34:56 INF] abc123 - API Gateway received notification request
[12:34:56 INF] abc123 - Fetching user uuid-456 from User Service
[12:34:57 INF] abc123 - User Service returned user data
[12:34:57 INF] abc123 - Fetching template "welcome" from Template Service
[12:34:57 INF] abc123 - Template Service returned template
[12:34:57 INF] abc123 - Publishing to email.queue
[12:34:58 INF] abc123 - Email Service processing message
[12:34:59 INF] abc123 - Email sent successfully to [email protected]

Failed Message Queue Inspection

Messages that fail after all retry attempts are moved to the failed.queue for inspection.

Accessing Failed Messages

  1. Open RabbitMQ Management UI:
http://localhost:15673
  1. Navigate to the failed queue:
    • Click “Queues” tab
    • Click “failed.queue”
  2. Get messages from the queue:
    • Scroll to “Get messages” section
    • Set “Messages” to 10
    • Click “Get Message(s)“

Analyzing Failed Messages

Each failed message includes:
  • Original message payload
  • Error details (in headers)
  • Retry count
  • Timestamp

Reprocessing Failed Messages

Option 1: Manually requeue via RabbitMQ UI
  1. In RabbitMQ UI, go to failed.queue
  2. Click “Move messages”
  3. Select destination queue (e.g., email.queue)
  4. Click “Move messages”
Option 2: Command-line requeue
# Move all messages from failed.queue to email.queue
docker-compose exec rabbitmq rabbitmqadmin get queue=failed.queue requeue=false count=100 \
  | docker-compose exec -T rabbitmq rabbitmqadmin publish routing_key=email.queue
Option 3: Write a retry script
const amqp = require('amqplib');

async function retryFailedMessages() {
  const connection = await amqp.connect('amqp://localhost:5673');
  const channel = await connection.createChannel();
  
  // Get messages from failed queue
  let msg;
  while (msg = await channel.get('failed.queue')) {
    // Republish to original queue
    await channel.sendToQueue('email.queue', msg.content);
    channel.ack(msg);
  }
  
  await connection.close();
}

Service-Specific Debugging

API Gateway

# Enable verbose logging
export LOG_LEVEL=debug

# Check all environment variables
docker-compose exec api-gateway env

# Test RabbitMQ connection
curl http://localhost:8000/health

Email Service (C#)

# View detailed logs
docker-compose logs -f email-service

# Check SMTP connectivity
docker-compose exec email-service nc -zv mailhog 1025

# Inspect service environment
docker-compose exec email-service env | grep -E "SMTP|RABBITMQ|REDIS"

Database Services

# Connect to User Service database
docker-compose exec postgres psql -U postgres -d notification_db

# List all tables
\dt

# Query users
SELECT * FROM users LIMIT 10;

# Check notification status
SELECT * FROM notifications WHERE status = 'failed';

Getting Additional Help

Health Checks

Verify service health and dependencies

Monitoring

Set up comprehensive system monitoring
When reporting issues, always include:
  • Service logs with correlation IDs
  • Health check responses
  • Environment variables (redact secrets)
  • Docker Compose status output

Build docs developers (and LLMs) love