Overview
The Terraform deployment includes comprehensive monitoring with CloudWatch logs, metrics, alarms, and a dashboard. This section covers accessing logs, understanding alerts, and monitoring system health.Monitoring Components
CloudWatch Logs
Application logs from ECS containers and Lambda functions
CloudWatch Metrics
CPU, memory, request counts, and response times
CloudWatch Alarms
10 automated alerts for critical issues
CloudWatch Dashboard
Visual overview of system health and performance
CloudWatch Log Groups
Terraform creates two log groups with 14-day retention:ECS Application Logs
Log group:/ecs/llmstxt-api
Contains:
- FastAPI application logs
- Crawling progress and results
- Error messages and stack traces
- HTTP request/response logs
Lambda Function Logs
Log group:/aws/lambda/llmstxt-auto-update
Contains:
- Lambda execution logs
- Cron trigger events
- Recrawl endpoint responses
- Errors and timeouts
View Logs
Via AWS Console
Open CloudWatch Console
Navigate to CloudWatch Console
Access Log Groups
- Click Logs → Log groups in left sidebar
- Select log group:
/ecs/llmstxt-apifor application logs/aws/lambda/llmstxt-auto-updatefor Lambda logs
View Log Streams
Each container/Lambda execution creates a separate log stream:
- ECS:
ecs/llmstxt-api/[task-id] - Lambda:
[date]/[execution-id]
Via AWS CLI
Tail ECS Logs (Live)
Press
Ctrl+C to stop tailing. --follow keeps the stream open for new logs.Tail Lambda Logs
Filter for Errors
Get Recent Logs
CloudWatch Metrics
Key Metrics
The deployment tracks these critical metrics:ECS Service Metrics
ECS Service Metrics
- CPUUtilization: Percentage of allocated CPU used
- MemoryUtilization: Percentage of allocated memory used
- RunningTaskCount: Number of active containers
- DesiredTaskCount: Target number of containers
Application Load Balancer Metrics
Application Load Balancer Metrics
- RequestCount: Total HTTP requests
- HTTPCode_Target_2XX_Count: Successful responses
- HTTPCode_Target_5XX_Count: Server errors
- TargetResponseTime: Average response time in seconds
- UnHealthyHostCount: Number of failing targets
- HealthyHostCount: Number of healthy targets
Lambda Function Metrics
Lambda Function Metrics
- Invocations: Number of executions
- Errors: Failed executions
- Duration: Execution time in milliseconds
- Throttles: Rate-limited invocations
View Metrics in Console
Browse by Namespace
- AWS/ECS: ECS service metrics
- AWS/ApplicationELB: Load balancer metrics
- AWS/Lambda: Lambda function metrics
Select Metrics
- Choose namespace
- Select dimension (e.g., Service/Cluster, LoadBalancer, Function)
- Check metrics to graph
View Metrics via CLI
ECS CPU Utilization
ALB Request Count
CloudWatch Alarms
Terraform configures 10 alarms to detect and alert on critical issues.Configured Alarms
| Alarm Name | Metric | Threshold | Description |
|---|---|---|---|
llmstxt-ecs-no-running-tasks | RunningTaskCount | < 1 | ECS service has no active containers |
llmstxt-alb-unhealthy-targets | UnHealthyHostCount | ≥ 1 | ALB has unhealthy targets |
llmstxt-alb-high-5xx-errors | HTTPCode_Target_5XX | > 10 in 5 min | High server error rate |
llmstxt-lambda-errors | Lambda Errors | ≥ 1 | Lambda function errors |
llmstxt-application-errors | Custom log filter | > 5 in 5 min | Application ERROR logs |
llmstxt-ecs-high-cpu | CPUUtilization | > 80% for 15 min | High CPU usage |
llmstxt-ecs-high-memory | MemoryUtilization | > 85% for 15 min | High memory usage |
llmstxt-alb-high-response-time | TargetResponseTime | > 5s for 10 min | Slow response times |
llmstxt-lambda-duration-high | Lambda Duration | > 540s (9 min) | Lambda near timeout |
llmstxt-lambda-throttles | Lambda Throttles | ≥ 1 | Lambda rate limited |
View Alarm Status
Via Console
- Go to CloudWatch Console → Alarms → All alarms
- Filter by prefix:
llmstxt- - Check alarm states:
- 🟢 OK: Normal operation
- 🔴 ALARM: Issue detected
- 🔵 INSUFFICIENT_DATA: Collecting data
Via CLI
Email Notifications
Alarms send notifications via Amazon SNS.Confirm SNS Subscription
After Terraform deployment, check your email for:
- Subject: “AWS Notification - Subscription Confirmation”
- From:
[email protected]
Add Additional Email Recipients
CloudWatch Dashboard
Terraform creates a dashboard namedllmstxt-overview with key metrics.
Access Dashboard
- Go to CloudWatch Console → Dashboards
- Click
llmstxt-overview
Dashboard Widgets
ECS Service - CPU & Memory
ECS Service - CPU & Memory
- CPU utilization percentage (0-100%)
- Memory utilization percentage (0-100%)
- 5-minute intervals
ALB - Requests & Errors
ALB - Requests & Errors
- Total request count
- 2xx success responses
- 5xx error responses
- 5-minute intervals
Lambda - Invocations & Errors
Lambda - Invocations & Errors
- Total invocations
- Error count
- 1-hour intervals
Lambda - Duration
Lambda - Duration
- Average duration (ms)
- Maximum duration (ms)
- 1-hour intervals
Customize Dashboard
Add custom widgets:- Click Actions → Add widget
- Choose widget type (Line, Number, etc.)
- Select metrics
- Click Create widget
- Click Save dashboard
Application Error Log Filter
Terraform creates a metric filter to count ERROR log entries.View Error Metric
Modify Filter Pattern
Editterraform/monitoring.tf:
Performance Monitoring
ECS Task Performance
Monitor container resource usage:ALB Performance
Check response times and throughput:Log Retention
By default, logs are retained for 14 days.Change Retention Period
Editterraform/main.tf and terraform/ecs.tf:
Longer retention increases CloudWatch Logs costs. 14 days is recommended for production.
Troubleshooting with Logs
Common Issues
High 5xx error rate
High 5xx error rate
Check application logs for errors:Common causes:
- Database connection failures (Supabase)
- R2 storage authentication errors
- Invalid environment variables
ECS task keeps restarting
ECS task keeps restarting
Check stopped task reason:Then check logs for the stopped task.
Lambda timeouts
Lambda timeouts
Check Lambda duration metric:If consistently near 600000ms (10 min), increase timeout in
terraform/main.tf.Cost Optimization
Reduce Log Costs
- Decrease retention period (7 days instead of 14)
- Reduce log verbosity in application
- Use log sampling for high-volume debug logs
Monitor CloudWatch Costs
Summary
You now have comprehensive monitoring with:
- Real-time logs accessible via Console and CLI
- 10 CloudWatch alarms for critical issues
- Email notifications via SNS
- Visual dashboard for system health
- Metric tracking for performance analysis
Complete Deployment
Your llms.txt Generator is fully deployed and monitored! Return to the Deployment Overview for next steps.