Deployments - Unkey

Unkey is deployed globally across multiple regions to provide low-latency access to API verification and management operations. The platform uses a distributed architecture that separates control plane operations from data plane execution.

Deployment Model

Unkey operates on a multi-region, edge-optimized deployment model:

Architecture Layers

Edge Layer

Frontline services deployed globally for TLS termination and intelligent routing to regional data planes

Regional Data Plane

Sentinel and API services with regional databases and caches for fast key verification

Control Plane

Centralized Control API and Worker for deployment orchestration and management

Regional Availability

Unkey Cloud Regions

Unkey Cloud is deployed in the following regions:

Production
Edge Points of Presence

Region	Location	Services	Primary
`us-east-1`	US East (Virginia)	Full Stack	✓
`us-west-2`	US West (Oregon)	Data Plane
`eu-west-1`	Europe (Ireland)	Data Plane
`ap-southeast-1`	Asia Pacific (Singapore)	Data Plane
`ap-northeast-1`	Asia Pacific (Tokyo)	Data Plane

Full Stack: Control Plane + Data Plane
Data Plane: API, Sentinel, Frontline, Regional Cache/DB

Service Distribution

Primary Region (us-east-1)

The primary region hosts the complete Unkey stack:Control Plane:

Control API (deployment management)
Control Worker (orchestration workflows)
MySQL Primary (source of truth)
Restate (workflow engine)
Vault (encryption service)

Data Plane:

API Service (key verification)
Sentinel (environment gateway)
Frontline (edge ingress)
Redis (cache and rate limiting)
MySQL Replica (regional reads)
ClickHouse (analytics)

Secondary Regions

Secondary regions host data plane services for low-latency verification:Deployed Services:

API Service
Sentinel
Frontline
Redis (regional instance)
MySQL Replica (read replica from primary)

Not Deployed:

Control API (centralized in primary)
Control Worker (centralized in primary)
Vault (accessed via RPC from primary)
Restate (centralized in primary)

Secondary regions can verify keys and enforce rate limits independently, even if the primary region is unavailable.

Edge Locations

Edge locations run lightweight Frontline instances:Responsibilities:

TLS termination
SNI-based routing
Certificate caching
Geo-routing to nearest data plane

No Data Persistence:

All edge instances are stateless
Route decisions cached in-memory
Automatic failover to alternate regions

Routing and Traffic Flow

Request Path

DNS Resolution: Client resolves api.unkey.com to nearest edge location via GeoDNS
Edge Termination: Frontline terminates TLS and reads SNI hostname
Environment Routing: Frontline queries routing table and selects target Sentinel
Regional Gateway: Sentinel resolves deployment ID and forwards to API instance
Key Verification: API service processes request with regional cache/database
Response: Response flows back through Sentinel → Frontline → Client

The entire request path from edge to API service typically completes in <10ms at P50 and <50ms at P99.

Cross-Region Routing

Frontline automatically routes requests across regions when necessary: Routing Logic:

Prefers same-region routing when environment exists in client region
Falls back to cross-region when environment only exists in different region
Retries alternate regions on failure (circuit breaker)
Caches routing decisions to avoid database lookups

Regional Failover

Unkey implements automatic regional failover:

Health Monitoring: Continuous health checks on all regional services
Circuit Breaking: Failing regions marked unhealthy after threshold
Automatic Rerouting: Frontline redirects traffic to healthy regions
Gradual Recovery: Failed regions gradually receive traffic after recovery

During regional failover, cross-region requests may experience higher latency (typically +50-100ms) but remain functional.

Latency Characteristics

Typical Latencies by Region

Same-Region
Cross-Region
Cache Impact

When client and data plane are in the same region:

Operation	P50	P95	P99
Key Verification (cached)	5ms	15ms	30ms
Key Verification (uncached)	12ms	25ms	45ms
Rate Limit Check	8ms	18ms	35ms
Key Creation	25ms	50ms	80ms
Key Lookup	10ms	22ms	40ms

When request must route to different region:

Route	Additional Latency
US East → US West	+20-30ms
US → Europe	+50-80ms
US → Asia	+100-150ms
Europe → Asia	+80-120ms

Cross-region latency is primarily network transit time. Service processing time remains constant.

Cache hit rates significantly impact latency:

Cache Type	Hit Rate	Latency Reduction
Key Cache	>95%	-7ms average
Route Cache	>99%	-15ms average
API Metadata	>90%	-5ms average

Cache Configuration:

Key cache: 1-hour TTL with SWR
Route cache: 10-minute TTL with gossip invalidation
API metadata: 5-minute TTL

Latency Optimization Strategies

Regional Deployment

Deploy your application in the same region as Unkey data plane for minimal latency

Cache Warming

Pre-warm caches during deployment to avoid cold start penalties

Connection Pooling

Maintain persistent HTTP/2 connections to avoid TLS handshake overhead

Batch Operations

Use batch verification endpoints when verifying multiple keys

Infrastructure Components

Compute

Kubernetes Clusters:

EKS (Elastic Kubernetes Service) in AWS regions
Node pools with Karpenter autoscaling
gVisor runtime for workload isolation
Multi-zone distribution for high availability

Service Sizing:

Service	CPU per Pod	Memory per Pod	Min Replicas	Max Replicas
API	1000m	1Gi	3	50
Frontline	500m	512Mi	3	30
Sentinel	500m	512Mi	2	20
Control API	1000m	1Gi	2	10
Vault	500m	512Mi	2	10
Krane	250m	256Mi	1	3

Database Infrastructure

MySQL:

RDS Multi-AZ deployment in primary region
Read replicas in each secondary region
Automated backups with point-in-time recovery
Connection pooling via ProxySQL

Redis:

Dragonfly or Redis in cluster mode
Regional instances per data plane
Persistence disabled (cache and ephemeral counters only)
Automatic failover with Sentinel

ClickHouse:

Replicated tables across shards
Separate instance per region (optional)
90-day retention for analytics events
Asynchronous writes (fire-and-forget)

Network Architecture

Load Balancing:

AWS Network Load Balancer (NLB) for Frontline
Internal Kubernetes Services for inter-service communication
Cilium CNI for network policies

DNS Configuration:

Route53 GeoDNS for regional routing
Health-check based failover
Automatic TLS certificate provisioning via cert-manager

Security:

TLS 1.3 for all external connections
mTLS between services (optional)
Cilium network policies for pod-to-pod traffic
AWS Security Groups for infrastructure isolation

Storage

Object Storage (Vault):

S3 buckets in primary region
Optional cross-region replication
Versioning enabled for key recovery
Lifecycle policies for old key rotation

Persistent Volumes:

EBS volumes for database storage
Snapshots for backup/restore
Encryption at rest with KMS

Monitoring and Observability

Metrics Collection

All services expose Prometheus metrics:

Request Metrics: Latency histograms, error rates, throughput
System Metrics: CPU, memory, network I/O
Business Metrics: Key verifications, rate limit hits, cache hit rates
Deployment Metrics: Pod count, rollout status, health checks

Retention: 90 days in Prometheus, 1 year in long-term storage (Thanos/Mimir)

Distributed Tracing

OpenTelemetry traces span the entire request path:

Client Request
  → Frontline (span: tls_termination, routing)
    → Sentinel (span: deployment_lookup, middleware)
      → API (span: key_verification, rate_limit, audit_log)
        → Vault (span: decrypt)
        → MySQL (span: query)
        → Redis (span: cache_get)
      → ClickHouse (span: analytics_write)

Sampling: 1% of production traffic, 100% of errors

Health Checks

All services expose standard health endpoints:

/health/live: Liveness probe (process alive)
/health/ready: Readiness probe (ready to serve traffic)
/health/startup: Startup probe (initialization complete)

Kubernetes Configuration:

Liveness: 30s timeout, 3 failures = restart
Readiness: 10s timeout, 3 failures = remove from service
Startup: 60s timeout, 10 failures = mark unhealthy

Disaster Recovery

Backup Strategy

MySQL Backups:

Automated daily snapshots retained for 30 days
Transaction logs for point-in-time recovery
Cross-region backup replication
4-hour RPO (Recovery Point Objective)

ClickHouse Backups:

Weekly full backups
Daily incremental backups
90-day retention
Asynchronous, analytics loss acceptable

Vault Key Backups:

S3 versioning enabled
Cross-region replication
Encrypted with separate KMS key
Immutable backup mode (WORM)

Recovery Procedures

Regional Failure

Automatic failover triggers within 60 seconds
Frontline redirects traffic to healthy regions
API services serve from read replicas
Degraded mode: Read-only operations continue
Write operations queued or rejected (depending on configuration)
Recovery: Restore region, replay transaction logs, resume normal operation

RTO (Recovery Time Objective): 15 minutes for data plane, 1 hour for control plane

Database Failure

RDS automatic failover to standby (Multi-AZ)
Application connection retry with backoff
Read replicas promoted if primary unavailable
Point-in-time restore from backup if corruption detected

RTO: 5 minutes for automatic failover, 2 hours for restore from backup

Complete Region Loss

Manual intervention to verify scope
Promote read replica in alternate region to primary
Update DNS to redirect all traffic
Scale up alternate region capacity
Rebuild failed region from backups

RTO: 2-4 hours for full recovery

Scaling Characteristics

Horizontal Scaling

Automatic Scaling Triggers:

Service	Scale Up Threshold	Scale Down Threshold
API	CPU >70% or RPS >8k	CPU <30% and RPS <2k
Sentinel	CPU >60% or Connections >5k	CPU <20%
Frontline	CPU >60%	CPU <20%

Scaling Behavior:

Scale-up: Add 50% more replicas (min +1, max +10 per event)
Scale-down: Remove 25% of replicas (max -2 per event)
Cooldown: 3 minutes between scale events
Pod disruption budgets prevent scaling during deployments

Vertical Scaling

Services use Vertical Pod Autoscaler (VPA) in recommendation mode:

VPA monitors resource usage over 7 days
Recommends CPU/memory adjustments
Manual review and apply during maintenance window
Typically adjusted quarterly

Capacity Planning

Current capacity per region:

Peak RPS: 100,000+ requests per second
Key Verifications: 50M+ per minute
Concurrent Connections: 500,000+
API Keys: 10M+ active keys

Capacity planning is based on p95 latency targets. Burst capacity allows 2x peak load for short periods.

Cost Optimization

Unkey’s deployment model optimizes for cost efficiency:

Regional Caching: Reduce database load by 95%+ with aggressive caching
Spot Instances: Use spot nodes for non-critical workloads (30-50% cost savings)
Autoscaling: Scale down during off-peak hours
Storage Tiering: Move old analytics to cold storage (S3 Glacier)
Compression: Enable compression for ClickHouse and object storage

Self-Hosted Deployments

For self-hosted deployments, you can choose your deployment model:

Single Region: Simplest setup, lower cost, higher latency for distant users
Multi-Region Data Plane: Deploy API services in multiple regions with primary control plane
Full Multi-Region: Replicate entire stack across regions for maximum availability

See Self-Hosting Guide for detailed setup instructions.

Architecture Overview - System architecture and service details
Self-Hosting Guide - Run Unkey in your environment
Configuration Reference - Service configuration options

Get Started

Core Features

Integration Guides

Advanced Topics

Security

Platform

​Deployment Model

​Architecture Layers

Edge Layer

Regional Data Plane

Control Plane

​Regional Availability

​Unkey Cloud Regions

​Service Distribution

​Routing and Traffic Flow

​Request Path

​Cross-Region Routing

​Regional Failover

​Latency Characteristics

​Typical Latencies by Region

​Latency Optimization Strategies

Regional Deployment

Cache Warming

Connection Pooling

Batch Operations

​Infrastructure Components

​Compute

​Database Infrastructure

​Network Architecture

​Storage

​Monitoring and Observability

​Metrics Collection

​Distributed Tracing

​Health Checks

​Disaster Recovery

​Backup Strategy

​Recovery Procedures

​Scaling Characteristics

​Horizontal Scaling

​Vertical Scaling

​Capacity Planning

​Cost Optimization

​Self-Hosted Deployments

​Related Documentation

Build docs developers (and LLMs) love

Deployment Model

Architecture Layers

Regional Availability

Unkey Cloud Regions

Service Distribution

Routing and Traffic Flow

Request Path

Cross-Region Routing

Regional Failover

Latency Characteristics

Typical Latencies by Region

Latency Optimization Strategies

Infrastructure Components

Compute

Database Infrastructure

Network Architecture

Storage

Monitoring and Observability

Metrics Collection

Distributed Tracing

Health Checks

Disaster Recovery

Backup Strategy

Recovery Procedures

Scaling Characteristics

Horizontal Scaling

Vertical Scaling

Capacity Planning

Cost Optimization

Self-Hosted Deployments

Related Documentation