High Availability Best Practices
Multi-AZ Deployment
Always deploy across at least 3 Availability Zones for production workloads. Configuration:- Survives simultaneous failure of any single AZ
- Meets most compliance requirements (SOC2, HIPAA)
- Required for Amazon RDS Multi-AZ with read replicas
- Supports Kubernetes/EKS quorum-based systems
NAT Gateway Redundancy
Use one NAT Gateway per Availability Zone for production. Configuration:- Single NAT Gateway fails → All private subnet outbound traffic fails
- Application can’t download updates, access external APIs, or send notifications
- Impact duration: 3-5 minutes for AWS to detect and replace NAT Gateway
- Single NAT: $32.40/month (1 gateway)
- HA NAT (3 AZs): $97.20/month (3 gateways)
- Additional cost: $64.80/month for 99.99% availability SLA
NAT Gateway is covered by AWS’s 99.99% availability SLA only when deployed across multiple Availability Zones. Single NAT Gateway deployments have no SLA.
Security Best Practices
Network Segmentation
Implement strict network tier isolation using the module’s four subnet types. Public Subnets (aws_subnet.public):
- Purpose: Internet-facing load balancers only
- Never deploy: Application servers, databases, or compute instances
- Security groups: Restrict to ports 80/443 from 0.0.0.0/0
- Resources: ALB, NLB, NAT Gateways
aws_subnet.private):
- Purpose: Application tier (EC2, ECS, EKS, Lambda)
- Security groups: Allow inbound only from load balancer security groups
- Outbound: Internet via NAT Gateway for external API calls
- No public IPs: Instances unreachable from internet
aws_subnet.database):
- Purpose: RDS, Aurora, Redshift
- Security groups: Allow inbound only from application tier security groups
- Port restrictions: Only database ports (3306, 5432, etc.)
- Multi-AZ: RDS automatically creates standby in different AZ
aws_subnet.elasticache):
- Purpose: Redis, Memcached clusters
- Security groups: Allow inbound only from application tier
- Port restrictions: 6379 (Redis), 11211 (Memcached)
DNS Configuration
Enable both DNS settings for production VPCs. Required Configuration:- enable_dns_support: Enables VPC DNS resolver at 169.254.169.253 (main.tf:5)
- enable_dns_hostnames: Assigns DNS names to instances with public IPs (main.tf:4)
- Route 53 private hosted zones
- Service discovery (ECS, EKS)
- RDS endpoint resolution
- VPC endpoint DNS names
- AWS Systems Manager Session Manager
Without
enable_dns_hostnames = true, instances receive public IPs but no public DNS names, breaking many AWS service integrations.VPC Endpoint Security
Use VPC endpoints to avoid internet routing for AWS service traffic. S3 Endpoint (enable for all production VPCs):- S3 traffic never traverses internet or NAT Gateway (main.tf:130-149)
- Supports S3 bucket policies restricting access to specific VPC endpoint
- Prevents data exfiltration through unauthorized S3 buckets
- Audit all S3 access via VPC Flow Logs
- Application makes high-volume DynamoDB API calls
- Compliance requires AWS traffic to stay on AWS network
- Reducing NAT Gateway data processing costs
Cost Optimization
NAT Gateway Cost Management
NAT Gateways are typically the highest VPC cost component. Pricing (us-east-1):- Hourly: 32.40/month per gateway)
- Data processing: $0.045/GB
- NAT Gateways: 3 × 97.20
- Data processing (500 GB): 500 × 22.50
- Total: $119.70/month
Optimization Strategy 1: VPC Endpoints
Problem: Application transfers 2 TB/month to S3 through NAT Gateway Cost Without VPC Endpoint:- NAT processing: 2,048 GB × 92.16/month
- S3 endpoint: $0 (no charge)
- Savings: $92.16/month
Optimization Strategy 2: PrivateLink vs NAT Gateway
For high-traffic AWS services, compare VPC PrivateLink interface endpoints to NAT Gateway routing. NAT Gateway Route (current module default):- Services: EC2 API, ECS API, Secrets Manager, etc.
- Cost: 0.01/GB interface endpoint data (if applicable)
- Cost: 7.20/month per endpoint per AZ) + $0.01/GB
- Break-even: ~450 GB/month per endpoint
- High-traffic APIs (>500 GB/month): Use interface endpoints
- Low-traffic APIs: Use NAT Gateway (module default)
- S3 and DynamoDB: Always use gateway endpoints (free)
This module creates S3 and DynamoDB gateway endpoints but does not create interface endpoints. For PrivateLink interface endpoints (EC2, ECS, Secrets Manager, etc.), create those separately after the VPC is provisioned.
Optimization Strategy 3: Single NAT for Non-Production
Development/Staging Configuration:- Production (3 NAT): $97.20/month
- Non-Production (1 NAT): $32.40/month
- Savings: $64.80/month per environment
- No AZ redundancy (acceptable for dev/test)
- Cross-AZ data transfer charges apply
- All outbound traffic flows through single gateway
Optimization Strategy 4: Right-Size Subnet CIDRs
Avoid over-provisioning IP addresses. Anti-Pattern (waste of IP space):- AWS charges per ENI, not per IP
- Smaller subnets enable better VPC peering and security group planning
- Leaves room for future subnet types (ML, analytics, etc.)
Monitoring and Observability
VPC Flow Logs
Enable VPC Flow Logs for security and troubleshooting. Configuration (create separately after VPC):- Detect security group misconfigurations (rejected traffic)
- Troubleshoot connectivity issues (route table problems)
- Audit compliance (who accessed what)
- Identify top talkers for cost optimization
CloudWatch Metrics
Monitor critical VPC metrics. NAT Gateway Metrics:BytesOutToDestination- Outbound traffic volumeBytesInFromDestination- Response traffic volumePacketsDropCount- Dropped packets (potential capacity issue)ErrorPortAllocation- Port exhaustion warning
Disaster Recovery
VPC Design for Multi-Region DR
Primary Region VPC:- Non-overlapping CIDR blocks (enables VPC peering or Transit Gateway)
- Identical subnet structure (simplifies infrastructure-as-code)
- Same security group port ranges (enables reusable Terraform modules)
Backup VPC State
Store Terraform state remotely with versioning. Configuration:- VPC deletion is catastrophic (all resources must be recreated)
- State file corruption can prevent VPC modifications
- Versioning enables rollback after accidental changes
Compliance Considerations
Tagging Strategy
Implement comprehensive tagging using the module’s tag variables. Configuration:- PCI-DSS: Tag database subnets with cardholder data classification
- HIPAA: Tag subnets containing PHI
- SOC2: Tag with data classification and responsible team
- GDPR: Tag subnets containing EU personal data
All tagging variables (main.tf:7, 15, 24, 49, 59, 69, 89, 108) merge tags, so resource-specific tags are combined with global tags. This enables both organizational and compliance tagging.
Network ACLs
The module uses VPC default NACLs (allowing all traffic). For compliance, create custom NACLs. Example: Restrict database subnet access- PCI-DSS compliance requirements
- Defense-in-depth alongside security groups
- Subnet-level DDoS protection
- Explicit deny rules for known malicious IPs