Skip to main content

VPC Architecture Overview

The GovTech platform uses a private-by-default architecture. All application components run in private subnets with no direct internet access.

Network Principle

No critical resources (EKS nodes, RDS) are accessible directly from the internet. All traffic enters through the Application Load Balancer in public subnets.

VPC CIDR Ranges

EnvironmentVPC CIDRAvailable IPsPurpose
Development10.0.0.0/1665,536govtech-dev cluster
Staging10.1.0.0/1665,536govtech-staging cluster
Production10.2.0.0/1665,536govtech-prod cluster
Separate VPCs per environment ensure complete isolation. A breach in dev cannot affect production.

Subnet Layout

Production VPC (10.2.0.0/16)

VPC: 10.2.0.0/16 (65,536 IPs)

├── PUBLIC SUBNETS (ALB, NAT Gateway)
│   ├── us-east-1a: 10.2.1.0/24  (256 IPs)
│   ├── us-east-1b: 10.2.2.0/24  (256 IPs)
│   └── us-east-1c: 10.2.3.0/24  (256 IPs)

└── PRIVATE SUBNETS (EKS nodes, RDS)
    ├── us-east-1a: 10.2.10.0/24 (256 IPs)
    ├── us-east-1b: 10.2.11.0/24 (256 IPs)
    └── us-east-1c: 10.2.12.0/24 (256 IPs)

Terraform Configuration

terraform/environments/prod/main.tf
module "networking" {
  source = "../../modules/networking"

  environment  = "prod"
  region       = "us-east-1"
  project_name = "govtech"
  vpc_cidr     = "10.2.0.0/16"

  availability_zones   = ["us-east-1a", "us-east-1b", "us-east-1c"]
  public_subnet_cidrs  = ["10.2.1.0/24", "10.2.2.0/24", "10.2.3.0/24"]
  private_subnet_cidrs = ["10.2.10.0/24", "10.2.11.0/24", "10.2.12.0/24"]
}

Network Components

Internet Gateway

Connects the VPC to the internet:
terraform/modules/networking/aws.tf
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name        = "govtech-igw-prod"
    Environment = "prod"
  }
}

Purpose

Allows resources in public subnets to communicate with the internet

Usage

  • ALB receives HTTPS traffic from internet
  • NAT Gateways route private subnet traffic out

NAT Gateways (One per AZ)

Enables private subnet resources to access the internet:
terraform/modules/networking/aws.tf
# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
  count  = length(var.availability_zones)
  domain = "vpc"

  depends_on = [aws_internet_gateway.main]
}

# NAT Gateway per AZ for high availability
resource "aws_nat_gateway" "main" {
  count         = length(var.availability_zones)
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "govtech-nat-${var.availability_zones[count.index]}-prod"
  }
}
High Availability: If NAT Gateway in us-east-1a fails, pods in us-east-1b and us-east-1c continue functioning.Cost: Each NAT Gateway costs ~$32/month + data transfer. Development can use 1 NAT to save costs.Use Cases:
  • EKS pods downloading packages from npm, PyPI
  • Backend calling external APIs (payment, email)
  • Container images pulled from ECR

Routing Tables

Public Subnet Route Table

All internet traffic goes through Internet Gateway:
terraform/modules/networking/aws.tf
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"                 # All traffic
    gateway_id = aws_internet_gateway.main.id # to Internet Gateway
  }

  tags = {
    Name = "govtech-rt-public-prod"
  }
}

resource "aws_route_table_association" "public" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

Private Subnet Route Tables (One per AZ)

Each AZ routes through its own NAT Gateway:
resource "aws_route_table" "private" {
  count  = length(var.availability_zones)
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"                      # All external traffic
    nat_gateway_id = aws_nat_gateway.main[count.index].id # to NAT in same AZ
  }

  tags = {
    Name = "govtech-rt-private-${var.availability_zones[count.index]}-prod"
  }
}

resource "aws_route_table_association" "private" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

Security Groups

EKS Cluster Security Group

Controls traffic to/from EKS nodes:
terraform/modules/networking/aws.tf
resource "aws_security_group" "eks_cluster" {
  name        = "govtech-eks-sg-prod"
  description = "Security group for EKS cluster"
  vpc_id      = aws_vpc.main.id

  # INGRESS: Allow HTTPS from anywhere (ALB)
  ingress {
    description = "HTTPS from internet"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # INGRESS: Allow HTTP (redirects to HTTPS)
  ingress {
    description = "HTTP redirect to HTTPS"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # INGRESS: Allow internal cluster communication
  ingress {
    description = "Internal EKS communication"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"  # All protocols
    self        = true  # Only from this security group
  }

  # EGRESS: Allow all outbound (for updates, AWS APIs)
  egress {
    description = "All outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

RDS Security Group

Database only accessible from EKS:
terraform/modules/networking/aws.tf
resource "aws_security_group" "rds" {
  name        = "govtech-rds-sg-prod"
  description = "Security group for RDS PostgreSQL - only EKS access"
  vpc_id      = aws_vpc.main.id

  # INGRESS: PostgreSQL only from EKS security group
  ingress {
    description     = "PostgreSQL from EKS only"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.eks_cluster.id]
  }

  # EGRESS: Allow outbound (for replication if Multi-AZ)
  egress {
    description = "Outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}
RDS security group only accepts connections from the EKS security group. No direct internet access possible.

Traffic Flow

User Request Flow

Pod Outbound Request Flow

Subnet Tagging for EKS

EKS Load Balancer Controller uses tags to discover subnets:
terraform/modules/networking/aws.tf
# Public subnets (for ALB)
resource "aws_subnet" "public" {
  tags = {
    "kubernetes.io/role/elb" = "1"  # Used by ALB controller
    "kubernetes.io/cluster/govtech-prod" = "shared"
  }
}

# Private subnets (for internal load balancers)
resource "aws_subnet" "private" {
  tags = {
    "kubernetes.io/role/internal-elb" = "1"  # For internal LBs
    "kubernetes.io/cluster/govtech-prod" = "shared"
  }
}

Multi-AZ High Availability

1

Availability Zone Failure

If us-east-1a fails completely:
  • ALB stops routing to pods in AZ-a
  • Pods in us-east-1b and us-east-1c continue serving traffic
  • NAT Gateways in AZ-b and AZ-c still operational
  • RDS fails over to standby in different AZ (Multi-AZ enabled)
2

Automatic Recovery

  • EKS Auto Scaling Group launches new nodes in healthy AZs
  • Kubernetes reschedules pods from failed AZ
  • Total downtime: 2-5 minutes for pod rescheduling
3

RDS Failover

  • RDS automatically fails over to standby instance
  • DNS endpoint updated to point to new primary
  • Application connection briefly drops, then reconnects
  • Failover time: 60-120 seconds

Network Performance

MetricValueNotes
VPC BandwidthUp to 100 GbpsInstance type dependent
NAT GatewayUp to 100 GbpsScales automatically
ALBAuto-scalesNo bandwidth limit
Inter-AZ Latency< 2msus-east-1 region
RDS Multi-AZ SyncSynchronous< 1ms replication lag

VPC Endpoints (Optional Enhancement)

For enhanced security and reduced NAT costs:
# S3 VPC Endpoint (free, no NAT required)
resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.us-east-1.s3"
  
  route_table_ids = aws_route_table.private[*].id
}

# ECR VPC Endpoint (saves NAT bandwidth costs)
resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.us-east-1.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.eks_cluster.id]
  private_dns_enabled = true
}
VPC Endpoints allow pods to access AWS services without using NAT Gateway, reducing costs and improving security.

Network Cost Optimization

  • Single NAT Gateway: Use only one NAT Gateway instead of 3
  • 2 AZs: Deploy across 2 AZs instead of 3
  • Savings: ~$128/month (2 NAT Gateways)
  • Trade-off: No AZ-level redundancy for NAT
  • 3 NAT Gateways: Full redundancy
  • 3 AZs: Production-like architecture
  • VPC Endpoints: S3 and ECR endpoints
  • Cost: ~$96/month for NAT
  • 3 NAT Gateways: One per AZ for redundancy
  • 3 AZs: us-east-1a, 1b, 1c
  • VPC Endpoints: All AWS services
  • Cost: ~$96/month + data transfer
  • Benefit: Zero downtime during AZ failure

Verification Commands

# List VPC
aws ec2 describe-vpcs --filters Name=tag:Project,Values=govtech

# List subnets
aws ec2 describe-subnets --filters Name=vpc-id,Values=vpc-xxx

# Check route tables
aws ec2 describe-route-tables --filters Name=vpc-id,Values=vpc-xxx

# Verify NAT Gateways
aws ec2 describe-nat-gateways --filter Name=vpc-id,Values=vpc-xxx

# Test connectivity from pod
kubectl exec -it backend-pod -- curl https://api.github.com
The network architecture provides defense-in-depth with private-by-default resources, multi-AZ redundancy, and fine-grained security group controls.

Build docs developers (and LLMs) love