Skip to main content

Overview

The GovTech platform uses AWS Cost Explorer, Cost Anomaly Detection, and infrastructure best practices to maintain cost efficiency while meeting performance and compliance requirements.

Cost Monitoring

AWS Cost Anomaly Detection

Automated anomaly detection alerts on unusual spending patterns:
terraform/modules/security/aws.tf
resource "aws_ce_anomaly_monitor" "govtech" {
  name              = "govtech-cost-monitor-prod"
  monitor_type      = "DIMENSIONAL"
  monitor_dimension = "SERVICE"
}

resource "aws_ce_anomaly_subscription" "govtech_alert" {
  name      = "govtech-cost-alert-prod"
  frequency = "DAILY"
  
  monitor_arn_list = [aws_ce_anomaly_monitor.govtech.arn]
  
  # Alert only if anomalous spend exceeds $50
  threshold_expression {
    dimension {
      key           = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
      values        = ["50"]
      match_options = ["GREATER_THAN_OR_EQUAL"]
    }
  }
  
  subscriber {
    type    = "EMAIL"
    address = "[email protected]"
  }
}
Cost Anomaly Detection uses machine learning to establish baseline spending patterns and alerts when actual costs deviate significantly.

What Triggers Alerts

Examples:
  • EKS costs increase from 200/monthto200/month to 600/month
  • S3 data transfer jumps from 100GB to 2TB
  • RDS costs spike due to increased IOPS
Action: Investigate service-specific usage in Cost Explorer
Detection of previously unused services:
  • Someone launches EC2 instances (not part of architecture)
  • NAT Gateway data transfer increases significantly
  • New AWS service activated
Action: Review CloudTrail to identify who/what initiated the change
Unexpected costs in non-primary regions:
  • Resources created in wrong region
  • Cross-region data transfer
Action: Use CloudTrail to find resource creation events

Cost Breakdown by Environment

Development Environment

ServiceResourceMonthly CostAnnual
EKSControl plane$73$876
EC22x t3.medium nodes$60$720
RDSdb.t3.micro$15$180
ALBApplication Load Balancer$18$216
NAT Gateway1 NAT GW$32$384
S3Storage + requests$5$60
Total~$180/month~$2,160/year

Production Environment

ServiceResourceMonthly CostAnnual
EKSControl plane$73$876
EC23-10x t3.medium nodes (avg 5)$150$1,800
RDSdb.t3.small Multi-AZ$75$900
ALBApplication Load Balancer$25$300
NAT Gateway3 NAT GW (Multi-AZ)$96$1,152
S3Storage + backups$20$240
CloudWatchLogs + metrics$15$180
Secrets ManagerSecrets storage$5$60
Total~$335/month~$4,020/year
Actual costs vary based on:
  • Traffic volume (ALB processing, NAT Gateway data transfer)
  • Auto-scaling (EKS node count)
  • Storage growth (RDS, S3)
  • Data transfer out to internet

Cost Optimization Strategies

1. Right-Sizing EC2 Instances

1

Monitor Utilization

Check actual CPU and memory usage:
# Get node utilization over last 7 days
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-xxxxx \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 86400 \
  --statistics Average,Maximum
2

Analyze Recommendations

Use AWS Compute Optimizer:
aws compute-optimizer get-ec2-instance-recommendations \
  --query 'instanceRecommendations[*].{Instance:instanceArn,Current:currentInstanceType,Recommended:recommendationOptions[0].instanceType}'
3

Adjust Node Group

Update EKS node group instance type:
terraform/modules/kubernetes-cluster/aws.tf
resource "aws_eks_node_group" "main" {
  instance_types = ["t3.small"]  # Downsize if utilization < 40%
  
  scaling_config {
    min_size     = 2
    max_size     = 8
    desired_size = 3
  }
}

2. Optimize RDS Instances

Use Multi-AZ Only in Production

Multi-AZ doubles RDS costs. Only enable for production:
multi_az = var.environment == "prod" ? true : false
Savings: ~50% for dev/staging

Enable Storage Autoscaling

Start small, grow as needed:
allocated_storage     = 20
max_allocated_storage = 100
Savings: Pay only for used storage

3. S3 Cost Optimization

Automatically transition objects to cheaper storage classes:
terraform/modules/storage/aws.tf
lifecycle_rule {
  enabled = true
  
  transition {
    days          = 30
    storage_class = "STANDARD_IA"  # 50% cheaper
  }
  
  transition {
    days          = 90
    storage_class = "GLACIER_IR"   # 80% cheaper
  }
  
  expiration {
    days = 365  # Delete after 1 year
  }
}
Savings: 50-80% on older backups
For unpredictable access patterns:
resource "aws_s3_bucket_intelligent_tiering_configuration" "backups" {
  bucket = aws_s3_bucket.storage.id
  name   = "backups-tiering"
  
  tiering {
    access_tier = "ARCHIVE_ACCESS"
    days        = 90
  }
}
Clean up abandoned uploads:
lifecycle_rule {
  enabled = true
  
  abort_incomplete_multipart_upload {
    days_after_initiation = 7
  }
}
Savings: Small but eliminates waste

4. NAT Gateway Optimization

NAT Gateways are expensive: $32/month per gateway + data transfer costs
Options:
Production: 3 NAT Gateways (one per AZ)
  • High availability
  • No cross-AZ data transfer costs
  • Cost: $96/month + data transfer

5. EKS Cost Optimization

1

Enable Cluster Autoscaler

Scale nodes based on actual pod resource requests:
# Install cluster autoscaler
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
Savings: Only run nodes when needed (especially off-hours)
2

Use Spot Instances for Non-Critical Workloads

resource "aws_eks_node_group" "spot" {
  capacity_type  = "SPOT"
  instance_types = ["t3.medium", "t3a.medium"]
  
  # Mix with on-demand for stability
  scaling_config {
    min_size     = 0
    max_size     = 5
    desired_size = 2
  }
}
Savings: 60-90% discount on EC2 costs
3

Set Pod Resource Requests Accurately

Prevent over-provisioning:
resources:
  requests:
    memory: "256Mi"  # Actual usage, not arbitrary
    cpu: "100m"
  limits:
    memory: "512Mi"  # 2x requests for burst
    cpu: "500m"

6. Reserved Instances and Savings Plans

For stable production workloads, commit to 1 or 3 years:
CommitmentDiscountBest For
1-year No Upfront20-40%Predictable baseline
1-year All Upfront30-50%Known requirements
3-year All Upfront40-60%Long-term stable workloads
Recommended Strategy:
  • Reserve capacity for minimum baseline (e.g., 2 nodes always running)
  • Use on-demand/spot for auto-scaling
# Analyze recommendations
aws ce get-reservation-purchase-recommendation \
  --service "Amazon Elastic Compute Cloud - Compute" \
  --lookback-period-in-days SIXTY_DAYS

Cost Monitoring Tools

AWS Cost Explorer

Access: AWS Console > Cost Management > Cost Explorer Key Reports:
  1. Monthly costs by service: Identify largest cost drivers
  2. Cost by tag: Track costs per environment (Environment: prod/dev)
  3. Daily spend trend: Detect sudden increases
  4. Reserved Instance utilization: Ensure RIs are being used

Cost Allocation Tags

Ensure all resources are tagged:
Common tags
tags = {
  Environment = var.environment  # prod, staging, dev
  Project     = "govtech"
  ManagedBy   = "terraform"
  CostCenter  = "engineering"
  Owner       = "devops-team"
}

Budgets and Alerts

Set up AWS Budgets for proactive monitoring:
# Create monthly budget with alerts
aws budgets create-budget \
  --account-id 835960996869 \
  --budget file://budget.json \
  --notifications-with-subscribers file://notifications.json
{
  "BudgetName": "GovTech-Prod-Monthly",
  "BudgetLimit": {
    "Amount": "400",
    "Unit": "USD"
  },
  "TimeUnit": "MONTHLY",
  "BudgetType": "COST",
  "CostFilters": {
    "TagKeyValue": ["user:Project$govtech", "user:Environment$prod"]
  }
}

Quick Wins Checklist

  • Enable S3 lifecycle policies for backups (50% savings)
  • Use single NAT Gateway in dev environment ($64/month savings)
  • Disable Multi-AZ RDS in dev/staging (50% RDS savings)
  • Delete unattached EBS volumes and old snapshots
  • Enable Cost Anomaly Detection alerts
  • Set up AWS Budgets for each environment
  • Review and delete unused Elastic IPs ($3.60/month each)
  • Enable EKS cluster autoscaler
  • Use Spot instances for dev/staging workloads
  • Create VPC endpoints for S3 and ECR

Cost Optimization Review Schedule

ActivityFrequencyOwner
Review Cost ExplorerWeeklyDevOps
Analyze anomaly alertsAs triggeredDevOps
Right-sizing analysisMonthlyInfrastructure
Reserved Instance reviewQuarterlyFinance + DevOps
Budget vs actual reviewMonthlyProject Lead
Tag compliance auditMonthlyDevOps

Useful Cost Queries

# Get current month spend
aws ce get-cost-and-usage \
  --time-period Start=$(date +%Y-%m-01),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE

# Get costs by environment tag
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '30 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=TAG,Key=Environment

# Forecast next month
aws ce get-cost-forecast \
  --time-period Start=$(date +%Y-%m-01),End=$(date -d 'next month' +%Y-%m-01) \
  --metric BLENDED_COST \
  --granularity MONTHLY

Monitoring

Resource utilization dashboards and metrics

Architecture

Infrastructure architecture and design decisions

Build docs developers (and LLMs) love