Lambda functions can fail for many reasons: timeouts, memory limits, permission errors, cold starts, or integration problems. Clanker helps you diagnose and fix these issues quickly.
# Overall Lambda healthclanker ask "What's the status of my Lambda functions?"# Functions with errorsclanker ask "Show me Lambda functions with high error rates"# Specific functionclanker ask "What's the error rate for my-function in the last 24 hours?"# Recent invocationsclanker ask "Show me the last 50 invocations of my-function"
Example output:
# Lambda Function Status: my-api-handler## Metrics (Last 24 Hours)- **Invocations**: 12,453- **Errors**: 1,534 (12.3%)- **Throttles**: 89- **Duration (avg)**: 3,245ms- **Timeout**: 3,000ms## Error Breakdown### Task timed out after 3.00 seconds (1,245 errors)- **Pattern**: Occurs during database queries- **First seen**: 2026-03-01 08:23:15- **Peak**: 2026-03-01 14:00-16:00 (450 errors/hour)### Unable to connect to RDS instance (289 errors)- **Likely cause**: Security group misconfiguration- **VPC subnet**: subnet-abc123 (no route to RDS)## Recommendations1. ⚠️ **Increase timeout** from 3s to 10s2. 🔒 **Fix VPC networking** - ensure Lambda can reach RDS3. 📈 **Add connection pooling** to reduce database connection overhead4. 🚨 **Increase reserved concurrency** to prevent throttling
# Recent logsclanker ask "Show me the last 100 log entries for my-function"# Search for errorsclanker ask "Find ERROR in logs for my-function from the last hour"# Specific time rangeclanker ask "Show me logs for my-function between 2pm and 3pm today"# Multiple functionsclanker ask "Search for 'timeout' across all Lambda functions"
# Duration trendsclanker ask "Show me duration trend for my-function over the last 7 days"# Memory usageclanker ask "What's the max memory used by my-function?"# Concurrent executionsclanker ask "Show me concurrent execution count for my-function"# Cold start analysisclanker ask "How many cold starts did my-function have today?"
Symptom: Task timed out after X.XX secondsDiagnosis:
# Check current timeout and durationclanker ask "What's the timeout and average duration for my-function?"# Analyze duration distributionclanker ask "Show me p50, p90, p99 duration for my-function"
Fix with maker:
# Increase timeout to 10 secondsclanker ask --maker "increase timeout for my-function to 10 seconds"# Review plancat plan.json# Applyclanker ask --apply < plan.json
Symptom: Function runs out of memory or is slowDiagnosis:
# Check current memory and usageclanker ask "What's the memory allocation and max memory used for my-function?"
Fix:
# Increase memory allocationclanker ask --maker "increase memory for my-function to 1024 MB"clanker ask --apply < plan.json
Increasing Lambda memory also increases CPU allocation. If your function is CPU-bound, more memory can improve performance even if you’re not hitting memory limits.
Symptom: AccessDeniedException, User is not authorizedDiagnosis:
# Check current IAM roleclanker ask "What IAM role does my-function use?"# View attached policiesclanker ask "What policies are attached to my-function's role?"# Check recent permission errorsclanker ask "Show me AccessDenied errors for my-function"
Fix:
# Add S3 read permissionsclanker ask --maker "add S3 read permissions to my-function"# Add DynamoDB accessclanker ask --maker "grant my-function access to DynamoDB table my-table"# Add SQS permissionsclanker ask --maker "allow my-function to read from SQS queue my-queue"
Clanker will generate appropriate attach-role-policy or put-role-policy commands.
Symptom: Cannot connect to RDS, ElastiCache, or other VPC resourcesDiagnosis:
# Check VPC configurationclanker ask "What VPC settings does my-function use?"# Check security groupsclanker ask "What security groups are attached to my-function?"# Verify RDS accessibilityclanker ask "Can my-function reach RDS instance my-db?"
Fix:
# Add to VPCclanker ask --maker "add my-function to VPC vpc-123 in private subnets"# Update security groupclanker ask --maker "add security group rule allowing my-function to access RDS on port 5432"
# Analyze init durationclanker ask "Show me init duration for my-function over the last 24 hours"# Compare cold vs warm startsclanker ask "What's the duration difference between cold and warm starts for my-function?"
Fixes:
Provisioned concurrency (keeps functions warm):
clanker ask --maker "add provisioned concurrency of 5 to my-function"
Reduce package size:
clanker ask "What's the deployment package size for my-function?"
Symptom: Lambda not processing SQS/Kinesis/DynamoDB eventsDiagnosis:
# Check event source mappingsclanker ask "What event sources are configured for my-function?"# Check mapping statusclanker ask "Show me event source mapping status for my-function"
Fix:
# Create SQS event source mappingclanker ask --maker "configure my-function to process messages from SQS queue my-queue with batch size 10"# Update mappingclanker ask --maker "update event source mapping for my-function to use batch size 5"
# Enable X-Rayclanker ask --maker "enable X-Ray tracing for my-function"# After enabling, query tracesclanker ask "Show me X-Ray traces for my-function with errors"
# Configure DLQ for failed async invocationsclanker ask --maker "configure dead letter queue for my-function using SQS queue my-dlq"# Check DLQ messagesclanker ask "How many messages are in my-dlq?"
# Alarm for errorsclanker ask --maker "create CloudWatch alarm for my-function errors exceeding 10 per minute"# Alarm for durationclanker ask --maker "create alarm when my-function duration exceeds 5 seconds"# Alarm for throttlesclanker ask --maker "alert me when my-function has more than 5 throttles in 5 minutes"
# Current alarmsclanker ask "Show me CloudWatch alarms for my Lambda functions"# Recent alarm triggersclanker ask "What alarms fired in the last 24 hours?"
# Get overviewclanker ask "Show me Lambda function health status"# Focus on failing functionclanker ask "Show me errors for my-function in the last hour"
2
Analyze logs and metrics
# Check logs for error patternsclanker ask "Find ERROR in my-function logs from the last 2 hours"# Check metricsclanker ask "Show me duration, memory, and error metrics for my-function"
3
Generate fix plan
# Example: timeout issueclanker ask --maker "increase timeout for my-function to 30 seconds and memory to 1024 MB" > fix-plan.json# Review plancat fix-plan.json
4
Apply fix
# Apply the fixclanker ask --apply < fix-plan.json# Verifyclanker ask "What's the current configuration for my-function?"
5
Monitor results
# Wait 10-15 minutes for new datasleep 900# Check if errors decreasedclanker ask "Show me error rate for my-function in the last 15 minutes"
# 1. Identify issue$ clanker ask "Show me Lambda functions with high error rates"# Output shows 'email-processor' has 15% error rate# 2. Investigate$ clanker ask "Show me errors for email-processor in the last hour"# Output shows 'Task timed out after 3.00 seconds'# 3. Check configuration$ clanker ask "What's the timeout and average duration for email-processor?"# Timeout: 3s, Avg duration: 2.8s (very close to timeout)# 4. Generate fix$ clanker ask --maker "increase timeout for email-processor to 10 seconds" > fix.json# 5. Review and apply$ cat fix.json$ clanker ask --apply < fix.json# 6. Verify (after a few minutes)$ clanker ask "Show me error rate for email-processor in the last 10 minutes"# Error rate: 0.5% (issue resolved)
File paths: Lambda runs from /var/task, not your local directory
Permissions: Local AWS credentials vs. Lambda execution role
Dependencies: Missing packages in deployment zip
# Check environment variablesclanker ask "What environment variables are set for my-function?"# Check execution roleclanker ask "What IAM role and policies does my-function have?"
Intermittent failures
Intermittent failures often indicate:
Concurrent execution limits: Check throttles
Downstream service issues: RDS, API endpoints
Cold starts: First invocations may timeout
# Check for throttlesclanker ask "Show me throttle count for my-function over the last 24 hours"# Check concurrent executionsclanker ask "What's the max concurrent executions for my-function today?"
Can't update function code
If updates fail:
# Check function stateclanker ask "What's the current state of my-function?"# If state is 'Pending', wait for update to complete# If state is 'Failed', check last update statusclanker ask "Show me the last update status for my-function"# Manual retryclanker ask --maker --apply "update code for my-function from S3"