Skip to main content

Overview

The Terraform deployment includes comprehensive monitoring with CloudWatch logs, metrics, alarms, and a dashboard. This section covers accessing logs, understanding alerts, and monitoring system health.

Monitoring Components

CloudWatch Logs

Application logs from ECS containers and Lambda functions

CloudWatch Metrics

CPU, memory, request counts, and response times

CloudWatch Alarms

10 automated alerts for critical issues

CloudWatch Dashboard

Visual overview of system health and performance

CloudWatch Log Groups

Terraform creates two log groups with 14-day retention:

ECS Application Logs

Log group: /ecs/llmstxt-api Contains:
  • FastAPI application logs
  • Crawling progress and results
  • Error messages and stack traces
  • HTTP request/response logs

Lambda Function Logs

Log group: /aws/lambda/llmstxt-auto-update Contains:
  • Lambda execution logs
  • Cron trigger events
  • Recrawl endpoint responses
  • Errors and timeouts

View Logs

Via AWS Console

1

Open CloudWatch Console

Navigate to CloudWatch Console
2

Access Log Groups

  1. Click LogsLog groups in left sidebar
  2. Select log group:
    • /ecs/llmstxt-api for application logs
    • /aws/lambda/llmstxt-auto-update for Lambda logs
3

View Log Streams

Each container/Lambda execution creates a separate log stream:
  • ECS: ecs/llmstxt-api/[task-id]
  • Lambda: [date]/[execution-id]
Click a stream to view logs.
4

Filter Logs

Use the filter box to search:
[ERROR]
Or:
"Failed to crawl"

Via AWS CLI

Tail ECS Logs (Live)

aws logs tail /ecs/llmstxt-api \
  --follow \
  --format short \
  --region us-east-1
Press Ctrl+C to stop tailing. --follow keeps the stream open for new logs.

Tail Lambda Logs

aws logs tail /aws/lambda/llmstxt-auto-update \
  --follow \
  --format short \
  --region us-east-1

Filter for Errors

aws logs filter-log-events \
  --log-group-name /ecs/llmstxt-api \
  --filter-pattern "[ERROR]" \
  --start-time $(date -u -d '1 hour ago' +%s)000 \
  --region us-east-1

Get Recent Logs

# Last 100 lines
aws logs tail /ecs/llmstxt-api \
  --since 1h \
  --region us-east-1

# Last 5 minutes
aws logs tail /ecs/llmstxt-api \
  --since 5m \
  --region us-east-1

CloudWatch Metrics

Key Metrics

The deployment tracks these critical metrics:
  • CPUUtilization: Percentage of allocated CPU used
  • MemoryUtilization: Percentage of allocated memory used
  • RunningTaskCount: Number of active containers
  • DesiredTaskCount: Target number of containers
  • RequestCount: Total HTTP requests
  • HTTPCode_Target_2XX_Count: Successful responses
  • HTTPCode_Target_5XX_Count: Server errors
  • TargetResponseTime: Average response time in seconds
  • UnHealthyHostCount: Number of failing targets
  • HealthyHostCount: Number of healthy targets
  • Invocations: Number of executions
  • Errors: Failed executions
  • Duration: Execution time in milliseconds
  • Throttles: Rate-limited invocations

View Metrics in Console

1

Open CloudWatch Metrics

CloudWatch Console → MetricsAll metrics
2

Browse by Namespace

  • AWS/ECS: ECS service metrics
  • AWS/ApplicationELB: Load balancer metrics
  • AWS/Lambda: Lambda function metrics
3

Select Metrics

  1. Choose namespace
  2. Select dimension (e.g., Service/Cluster, LoadBalancer, Function)
  3. Check metrics to graph
4

Customize Graph

  • Change time range (1h, 3h, 12h, 1d, 1w)
  • Adjust statistic (Average, Sum, Min, Max)
  • Set refresh interval

View Metrics via CLI

ECS CPU Utilization

aws cloudwatch get-metric-statistics \
  --namespace AWS/ECS \
  --metric-name CPUUtilization \
  --dimensions Name=ServiceName,Value=llmstxt-api-service Name=ClusterName,Value=llmstxt-cluster \
  --start-time $(date -u -d '1 hour ago' --iso-8601=seconds) \
  --end-time $(date -u --iso-8601=seconds) \
  --period 300 \
  --statistics Average \
  --region us-east-1

ALB Request Count

aws cloudwatch get-metric-statistics \
  --namespace AWS/ApplicationELB \
  --metric-name RequestCount \
  --dimensions Name=LoadBalancer,Value=$(cd terraform && terraform output -raw alb_arn | cut -d: -f6) \
  --start-time $(date -u -d '1 hour ago' --iso-8601=seconds) \
  --end-time $(date -u --iso-8601=seconds) \
  --period 300 \
  --statistics Sum \
  --region us-east-1

CloudWatch Alarms

Terraform configures 10 alarms to detect and alert on critical issues.

Configured Alarms

Alarm NameMetricThresholdDescription
llmstxt-ecs-no-running-tasksRunningTaskCount< 1ECS service has no active containers
llmstxt-alb-unhealthy-targetsUnHealthyHostCount≥ 1ALB has unhealthy targets
llmstxt-alb-high-5xx-errorsHTTPCode_Target_5XX> 10 in 5 minHigh server error rate
llmstxt-lambda-errorsLambda Errors≥ 1Lambda function errors
llmstxt-application-errorsCustom log filter> 5 in 5 minApplication ERROR logs
llmstxt-ecs-high-cpuCPUUtilization> 80% for 15 minHigh CPU usage
llmstxt-ecs-high-memoryMemoryUtilization> 85% for 15 minHigh memory usage
llmstxt-alb-high-response-timeTargetResponseTime> 5s for 10 minSlow response times
llmstxt-lambda-duration-highLambda Duration> 540s (9 min)Lambda near timeout
llmstxt-lambda-throttlesLambda Throttles≥ 1Lambda rate limited

View Alarm Status

Via Console

  1. Go to CloudWatch Console → AlarmsAll alarms
  2. Filter by prefix: llmstxt-
  3. Check alarm states:
    • 🟢 OK: Normal operation
    • 🔴 ALARM: Issue detected
    • 🔵 INSUFFICIENT_DATA: Collecting data

Via CLI

# List all alarms
aws cloudwatch describe-alarms \
  --alarm-name-prefix llmstxt- \
  --region us-east-1

# Get alarm state
aws cloudwatch describe-alarms \
  --alarm-names llmstxt-ecs-no-running-tasks \
  --query 'MetricAlarms[0].StateValue' \
  --output text \
  --region us-east-1

Email Notifications

Alarms send notifications via Amazon SNS.
1

Confirm SNS Subscription

After Terraform deployment, check your email for:Click “Confirm subscription” link.
2

Verify Subscription

aws sns list-subscriptions-by-topic \
  --topic-arn $(cd terraform && terraform output -raw sns_topic_arn) \
  --region us-east-1
Check SubscriptionArn is not PendingConfirmation.
3

Receive Alerts

When an alarm triggers, you’ll receive email:
  • ALARM state: Issue detected
  • OK state: Issue resolved
You won’t receive alert emails until SNS subscription is confirmed!

Add Additional Email Recipients

aws sns subscribe \
  --topic-arn $(cd terraform && terraform output -raw sns_topic_arn) \
  --protocol email \
  --notification-endpoint [email protected] \
  --region us-east-1
Recipient must confirm subscription via email.

CloudWatch Dashboard

Terraform creates a dashboard named llmstxt-overview with key metrics.

Access Dashboard

  1. Go to CloudWatch Console → Dashboards
  2. Click llmstxt-overview

Dashboard Widgets

  • CPU utilization percentage (0-100%)
  • Memory utilization percentage (0-100%)
  • 5-minute intervals
  • Total request count
  • 2xx success responses
  • 5xx error responses
  • 5-minute intervals
  • Total invocations
  • Error count
  • 1-hour intervals
  • Average duration (ms)
  • Maximum duration (ms)
  • 1-hour intervals

Customize Dashboard

Add custom widgets:
  1. Click ActionsAdd widget
  2. Choose widget type (Line, Number, etc.)
  3. Select metrics
  4. Click Create widget
  5. Click Save dashboard

Application Error Log Filter

Terraform creates a metric filter to count ERROR log entries.

View Error Metric

aws cloudwatch get-metric-statistics \
  --namespace LLMsTxt/Application \
  --metric-name ApplicationErrors \
  --start-time $(date -u -d '1 hour ago' --iso-8601=seconds) \
  --end-time $(date -u --iso-8601=seconds) \
  --period 300 \
  --statistics Sum \
  --region us-east-1

Modify Filter Pattern

Edit terraform/monitoring.tf:
resource "aws_cloudwatch_log_metric_filter" "ecs_application_errors" {
  name           = "llmstxt-ecs-application-errors"
  log_group_name = aws_cloudwatch_log_group.ecs_logs.name
  pattern        = "[ERROR]"  # Change this pattern

  metric_transformation {
    name      = "ApplicationErrors"
    namespace = "LLMsTxt/Application"
    value     = "1"
  }
}
Apply changes:
cd terraform
terraform apply

Performance Monitoring

ECS Task Performance

Monitor container resource usage:
# Get task ARN
TASK_ARN=$(aws ecs list-tasks \
  --cluster llmstxt-cluster \
  --service-name llmstxt-api-service \
  --query 'taskArns[0]' \
  --output text \
  --region us-east-1)

# Get task metrics
aws ecs describe-tasks \
  --cluster llmstxt-cluster \
  --tasks $TASK_ARN \
  --query 'tasks[0].containers[0].{CPU:cpu,Memory:memory,MemoryReservation:memoryReservation}' \
  --region us-east-1

ALB Performance

Check response times and throughput:
# Average response time (last hour)
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApplicationELB \
  --metric-name TargetResponseTime \
  --dimensions Name=LoadBalancer,Value=... \
  --start-time $(date -u -d '1 hour ago' --iso-8601=seconds) \
  --end-time $(date -u --iso-8601=seconds) \
  --period 300 \
  --statistics Average Maximum \
  --region us-east-1

Log Retention

By default, logs are retained for 14 days.

Change Retention Period

Edit terraform/main.tf and terraform/ecs.tf:
resource "aws_cloudwatch_log_group" "ecs_logs" {
  name              = "/ecs/llmstxt-api"
  retention_in_days = 30  # Change from 14 to 30 days
}

resource "aws_cloudwatch_log_group" "lambda_logs" {
  name              = "/aws/lambda/llmstxt-auto-update"
  retention_in_days = 30  # Change from 14 to 30 days
}
Apply changes:
cd terraform
terraform apply
Longer retention increases CloudWatch Logs costs. 14 days is recommended for production.

Troubleshooting with Logs

Common Issues

Check application logs for errors:
aws logs filter-log-events \
  --log-group-name /ecs/llmstxt-api \
  --filter-pattern "[ERROR]" \
  --start-time $(date -u -d '1 hour ago' +%s)000
Common causes:
  • Database connection failures (Supabase)
  • R2 storage authentication errors
  • Invalid environment variables
Check stopped task reason:
aws ecs describe-tasks \
  --cluster llmstxt-cluster \
  --tasks $TASK_ARN \
  --query 'tasks[0].stoppedReason'
Then check logs for the stopped task.
Check Lambda duration metric:
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Duration \
  --dimensions Name=FunctionName,Value=llmstxt-auto-update \
  --start-time $(date -u -d '6 hours ago' --iso-8601=seconds) \
  --end-time $(date -u --iso-8601=seconds) \
  --period 3600 \
  --statistics Maximum Average
If consistently near 600000ms (10 min), increase timeout in terraform/main.tf.

Cost Optimization

Reduce Log Costs

  • Decrease retention period (7 days instead of 14)
  • Reduce log verbosity in application
  • Use log sampling for high-volume debug logs

Monitor CloudWatch Costs

# Get CloudWatch Logs usage
aws cloudwatch get-metric-statistics \
  --namespace AWS/Logs \
  --metric-name IncomingBytes \
  --start-time $(date -u -d '1 month ago' --iso-8601=seconds) \
  --end-time $(date -u --iso-8601=seconds) \
  --period 2592000 \
  --statistics Sum \
  --region us-east-1
CloudWatch Logs pricing: ~0.50perGBingested,0.50 per GB ingested, 0.03 per GB stored.

Summary

You now have comprehensive monitoring with:
  • Real-time logs accessible via Console and CLI
  • 10 CloudWatch alarms for critical issues
  • Email notifications via SNS
  • Visual dashboard for system health
  • Metric tracking for performance analysis

Complete Deployment

Your llms.txt Generator is fully deployed and monitored! Return to the Deployment Overview for next steps.

Build docs developers (and LLMs) love