CloudWatch Log Groups
All Lambda functions automatically create CloudWatch log groups with structured logging:API Lambda
Log Group:/aws/lambda/alliance-risk-api
- HTTP request/response logs
- Authentication events (Cognito token verification)
- Database connection status
- Error stack traces
Worker Lambda
Log Group:/aws/lambda/alliance-risk-worker
- Job processing lifecycle (PENDING → PROCESSING → COMPLETED/FAILED)
- Bedrock model invocations
- SQL execution logs (migrations via
run-sqlaction) - Job retry attempts and failures
Accessing CloudWatch Logs
AWS Console
- Navigate to CloudWatch → Log Groups
- Select
/aws/lambda/alliance-risk-apior/aws/lambda/alliance-risk-worker - Click Search all log streams
- Use filter patterns (see below)
AWS CLI
Log Filter Patterns
Error Detection
Authentication Issues
Performance Monitoring
Key Metrics to Monitor
Lambda Metrics (CloudWatch)
| Metric | Threshold | Action |
|---|---|---|
| Invocations | Baseline | Track request volume trends |
| Errors | >1% of invocations | Investigate error logs |
| Duration | >25s (API), >14min (Worker) | Optimize or increase timeout |
| Throttles | >0 | Increase concurrency limit |
| ConcurrentExecutions | Near account limit | Request limit increase |
| IteratorAge | N/A (not using streams) | - |
API Lambda Alerts
Worker Lambda Alerts
Log Analysis Examples
Find Failed Jobs
Track Bedrock Usage
Identify Prisma Connection Issues
Structured Logging
The application uses NestJS Logger with structured output:Log Levels
| Level | Usage | CloudWatch Filter |
|---|---|---|
error | Unhandled exceptions, critical failures | [level=error] |
warn | HTTP 4xx, retryable failures | [level=warn] |
log | HTTP requests, job lifecycle | [level=log] |
debug | Detailed diagnostics (dev only) | [level=debug] |
Sample Log Entries
Successful Request:Setting Up CloudWatch Alarms
Create Error Rate Alarm (AWS CLI)
Create Duration Alarm
Lambda Cold Starts
Identifying Cold Starts
Look forINIT_START messages in logs:
Cold Start Duration
Cold starts add 2-5 seconds to first invocation:- NestJS module initialization (~1-2s)
- Prisma client generation (~500ms-1s)
- Database connection pool setup (~500ms)
Mitigation Strategies
-
Provisioned Concurrency (costs $$$):
-
Keep-Warm Schedule (EventBridge):
- Invoke Lambda every 5 minutes with warmup event
- Filter in code:
if (event.source === 'warmup') return
-
Accept Cold Starts (recommended for MVP):
- Cold starts are infrequent with moderate traffic
- First request after inactivity will be slower
Database Monitoring
RDS CloudWatch Metrics
| Metric | Threshold | Action |
|---|---|---|
| CPUUtilization | >80% sustained | Upgrade instance type |
| FreeableMemory | Less than 100MB | Upgrade instance type |
| DatabaseConnections | >80% of max | Investigate connection leaks |
| ReadLatency | Greater than 100ms | Add indexes, optimize queries |
| WriteLatency | Greater than 100ms | Check disk I/O, upgrade storage |
Connection Pool Monitoring
Prisma uses a connection pool per Lambda instance. Check logs for:- Multiple “Database connected” per invocation (connection leak)
- Lambda timeouts with no error (event loop not draining)
Bedrock Monitoring
Throttling Detection
Bedrock has model-specific rate limits. Check for:Circuit Breaker Status
The app uses a circuit breaker for Bedrock calls:- CLOSED: Normal operation
- OPEN: 3+ consecutive failures, all requests fail fast
- HALF_OPEN: Testing recovery, 2 successes → CLOSED
"Circuit breaker is open" in logs
Retry Behavior
Bedrock calls retry 3 times with exponential backoff:- Base delay: 200ms
- Max delay: 5s
- Retries:
ThrottlingException,ServiceUnavailableException
Next Steps
- Troubleshooting Guide — Common errors and solutions
- Infrastructure Setup — CloudFormation stack details