Skip to main content
This guide covers production deployment patterns for Zero Cache and your API server.

Architecture

Production Zero deployment:
┌─────────────────┐
│   Clients       │
└────────┬────────┘

    ┌────▼────────────────┐
    │ Load Balancer       │
    └────┬────────────────┘

    ┌────▼─────────────────────────┐
    │ View Syncers (ECS/Fargate)  │
    │ - Handle client connections  │
    │ - Process queries/mutations  │
    │ - Maintain client views      │
    └────┬─────────────────────────┘

    ┌────▼─────────────────────────┐
    │ Replication Manager          │
    │ - PostgreSQL replication     │
    │ - Change streaming           │
    │ - Litestream backup          │
    └────┬─────────────────────────┘

    ┌────▼──────────┐    ┌──────────────┐
    │  PostgreSQL   │    │  S3 Backup   │
    └───────────────┘    └──────────────┘

Deployment Options

Single-Node Deployment

Simplest deployment for small to medium applications:
# Single server runs replication + sync workers
ZERO_UPSTREAM_DB=postgresql://...
ZERO_MUTATE_URL=https://api.example.com/mutate
ZERO_QUERY_URL=https://api.example.com/query

zero-cache
Pros: Simple, low cost Cons: No horizontal scaling, single point of failure Separate replication manager from view syncers:

Replication Manager

# Dedicated replication manager
ZERO_NUM_SYNC_WORKERS=0
ZERO_UPSTREAM_DB=postgresql://...
ZERO_LITESTREAM_BACKUP_URL=s3://my-bucket/backup

zero-cache

View Syncers

# Multiple view syncers
ZERO_CHANGE_STREAMER_URI=http://replication-manager:4849
ZERO_UPSTREAM_DB=postgresql://...
ZERO_CVR_DB=postgresql://...
ZERO_MUTATE_URL=https://api.example.com/mutate
ZERO_QUERY_URL=https://api.example.com/query

zero-cache
Pros: Horizontal scaling, fault tolerance Cons: More complex, higher cost

AWS ECS/Fargate Deployment

Zero includes SST configuration for AWS deployment:
// sst.config.ts
export default $config({
  app(input) {
    return {
      name: 'my-app',
      removal: input?.stage === 'production' ? 'retain' : 'remove',
      home: 'aws',
    };
  },
  async run() {
    const vpc = new sst.aws.Vpc('vpc', {az: 2, nat: 'ec2'});
    const cluster = new sst.aws.Cluster('cluster', {vpc});
    
    const replicationBucket = new sst.aws.Bucket('replication-bucket');
    
    // Replication Manager
    const replicationManager = new sst.aws.Service('replication-manager', {
      cluster,
      cpu: '2 vCPU',
      memory: '8 GB',
      image: 'my-zero-image',
      environment: {
        ZERO_UPSTREAM_DB: process.env.ZERO_UPSTREAM_DB!,
        ZERO_NUM_SYNC_WORKERS: '0',
        ZERO_LITESTREAM_BACKUP_URL: $interpolate`s3://${replicationBucket.name}/backup`,
        ZERO_MUTATE_URL: process.env.ZERO_MUTATE_URL!,
        ZERO_QUERY_URL: process.env.ZERO_QUERY_URL!,
      },
      loadBalancer: {
        public: false,
        ports: [{listen: '80/http', forward: '4849/http'}],
      },
    });
    
    // View Syncers
    const viewSyncer = new sst.aws.Service('view-syncer', {
      cluster,
      cpu: '8 vCPU',
      memory: '16 GB',
      image: 'my-zero-image',
      environment: {
        ZERO_UPSTREAM_DB: process.env.ZERO_UPSTREAM_DB!,
        ZERO_CVR_DB: process.env.ZERO_CVR_DB!,
        ZERO_CHANGE_STREAMER_URI: replicationManager.url,
        ZERO_MUTATE_URL: process.env.ZERO_MUTATE_URL!,
        ZERO_QUERY_URL: process.env.ZERO_QUERY_URL!,
        ZERO_UPSTREAM_MAX_CONNS: '15',
        ZERO_CVR_MAX_CONNS: '160',
      },
      loadBalancer: {
        public: true,
        domain: {name: 'sync.example.com'},
        ports: [
          {listen: '443/https', forward: '4848/http'},
        ],
      },
      transform: {
        target: {
          stickiness: {
            enabled: true,
            type: 'lb_cookie',
            cookieDuration: 120,
          },
        },
        autoScalingTarget: {
          minCapacity: 2,
          maxCapacity: 10,
        },
      },
    });
  },
});
Deploy:
sst deploy --stage production

Docker Image

Create a Dockerfile for Zero Cache:
FROM node:22-alpine

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci --production

# Copy application code
COPY . .

# Build if needed
RUN npm run build

EXPOSE 4848 4849

CMD ["zero-cache"]
Build and push:
docker build -t my-zero-image .
docker tag my-zero-image:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/my-zero-image:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/my-zero-image:latest

PostgreSQL Configuration

Connection Pooling

Use separate databases for different concerns:
# Upstream: Source of truth
ZERO_UPSTREAM_DB=postgresql://user:[email protected]/app

# CVR: Client view records (high connection usage)
ZERO_CVR_DB=postgresql://user:[email protected]/cvr

# Change: Change data capture
ZERO_CHANGE_DB=postgresql://user:[email protected]/change

Connection Limits

Set connection limits based on worker count:
# 8 vCPU = ~8 workers
# Upstream: 2 connections per worker
ZERO_UPSTREAM_MAX_CONNS=16

# CVR: 20 connections per worker (high usage)
ZERO_CVR_MAX_CONNS=160

# Change: Shared across workers
ZERO_CHANGE_MAX_CONNS=5

RDS Configuration

Recommended RDS settings:
# Enable logical replication
rds.logical_replication = 1

# Replication slots
max_replication_slots = 10
max_wal_senders = 10

# Connection limits (adjust based on instance size)
max_connections = 500

# Performance
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB

Aurora PostgreSQL

Aurora requires additional configuration:
-- Create custom parameter group
CREATE PARAMETER GROUP aurora-zero-params;

-- Set logical replication
ALTER PARAMETER rds.logical_replication TO 1;

-- Apply to cluster
MODIFY DB CLUSTER my-cluster 
  APPLY IMMEDIATELY 
  DB CLUSTER PARAMETER GROUP NAME aurora-zero-params;

-- Reboot cluster
REBOOT DB CLUSTER my-cluster;

Litestream Backup

S3 Configuration

# Backup location
ZERO_LITESTREAM_BACKUP_URL=s3://my-bucket/zero-replica/backup

# AWS credentials (via IAM role preferred)
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1

# Backup intervals
ZERO_LITESTREAM_INCREMENTAL_BACKUP_INTERVAL_MINUTES=15
ZERO_LITESTREAM_SNAPSHOT_BACKUP_INTERVAL_HOURS=12

IAM Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-bucket/zero-replica/*",
        "arn:aws:s3:::my-bucket"
      ]
    }
  ]
}

EBS Volumes (High Performance)

For better performance, use EBS volumes:
// SST config
transform: {
  service: {
    volumeConfiguration: {
      name: 'replication-data',
      managedEbsVolume: {
        volumeType: 'gp3',
        sizeInGb: 30,
        iops: 15000,
        fileSystemType: 'ext4',
      },
    },
  },
}
Configure replica path:
ZERO_REPLICA_FILE=/data/sync-replica.db

Load Balancing

Health Checks

Configure health check endpoint:
HealthCheck:
  Path: /keepalive
  Protocol: HTTP
  Interval: 5
  Timeout: 3
  HealthyThreshold: 2
  UnhealthyThreshold: 3

Sticky Sessions

Enable for better performance:
TargetGroupAttributes:
  - Key: stickiness.enabled
    Value: 'true'
  - Key: stickiness.type
    Value: 'lb_cookie'
  - Key: stickiness.lb_cookie.duration_seconds
    Value: '120'

Idle Timeout

Increase for WebSocket connections:
LoadBalancerAttributes:
  - Key: idle_timeout.timeout_seconds
    Value: '3600'  # 1 hour

Auto Scaling

CPU-Based Scaling

AutoScaling:
  MinCapacity: 2
  MaxCapacity: 10
  TargetTrackingScaling:
    TargetValue: 70  # 70% CPU
    ScaleInCooldown: 300
    ScaleOutCooldown: 60

Connection-Based Scaling

Scale based on WebSocket connections:
AutoScaling:
  TargetTrackingScaling:
    CustomizedMetricSpecification:
      MetricName: ActiveConnectionCount
      Namespace: AWS/ApplicationELB
      Statistic: Average
    TargetValue: 1000  # 1000 connections per instance

Monitoring

CloudWatch Metrics

Key metrics to monitor:
  • CPU Utilization: Target 60-80%
  • Memory Utilization: Watch for OOM
  • Connection Count: Ensure within limits
  • Replication Lag: Should be near zero

OpenTelemetry

Export traces to observability platform:
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel-collector:4318
OTEL_EXPORTER_OTLP_HEADERS="x-api-key=..."
OTEL_RESOURCE_ATTRIBUTES="service.name=zero-cache,deployment.environment=production"

Logging

# JSON logs for CloudWatch
ZERO_LOG_FORMAT=json
ZERO_LOG_LEVEL=info

# Enable slow query logging
ZERO_QUERY_HYDRATION_STATS=true

Security

Network Security

# Restrict replication manager to internal access
ZERO_CHANGE_STREAMER_URI=http://internal-replication-manager:4849

# Public view syncers
ZERO_PORT=4848

Secrets Management

Use AWS Secrets Manager:
# Don't hardcode in environment
ZERO_UPSTREAM_DB=$(aws secretsmanager get-secret-value \
  --secret-id prod/zero/db-url \
  --query SecretString \
  --output text)

Admin Password

Always set in production:
ZERO_ADMIN_PASSWORD=$(openssl rand -base64 32)

API Server Deployment

Deploy your mutation/query API server:

Lambda Functions

// Serverless mutation handler
export const handler = async (event) => {
  const response = await handleMutateRequest(
    dbProvider,
    async (transact, mutation) => {
      const mutator = mutators[mutation.name];
      return await transact(async (tx, name, args) => {
        await mutator(tx, args);
      });
    },
    new Request(event.requestContext.http.path, {
      method: event.requestContext.http.method,
      headers: event.headers,
      body: event.body,
    })
  );
  
  return {
    statusCode: 200,
    body: JSON.stringify(response),
  };
};

ECS Service

Run alongside Zero Cache or separately:
# Same VPC as Zero Cache
# Access via internal URL
ZERO_MUTATE_URL=http://api-server.internal:3000/mutate
ZERO_QUERY_URL=http://api-server.internal:3000/query

Performance Tuning

Worker Count

# Auto-detect (recommended)
ZERO_NUM_SYNC_WORKERS=  # Defaults to CPU count

# Or set explicitly
ZERO_NUM_SYNC_WORKERS=8

Query Planner

# Enable query optimization (default)
ZERO_ENABLE_QUERY_PLANNER=true

# Adjust yield threshold for responsiveness
ZERO_YIELD_THRESHOLD_MS=10

Memory Management

# Node.js heap size
NODE_OPTIONS=--max-old-space-size=4096

# Back pressure limit (proportion of heap)
ZERO_CHANGE_STREAMER_BACK_PRESSURE_LIMIT_HEAP_PROPORTION=0.04

Disaster Recovery

Backup Strategy

  1. Litestream: Automatic SQLite replica backup
  2. PostgreSQL: Regular RDS snapshots
  3. CVR Database: Can be rebuilt from replica

Recovery Process

  1. Restore PostgreSQL from snapshot
  2. Deploy new Zero Cache instances
  3. Litestream restores replica from S3
  4. Clients reconnect and sync

Testing Recovery

# Simulate replica loss
rm zero.db*

# Restart - Litestream restores from backup
zero-cache

Cost Optimization

Right-Sizing

  • Start with smaller instances (2 vCPU, 8 GB)
  • Monitor CPU/memory usage
  • Scale up only if needed

Reserved Instances

For production, use reserved capacity:
  • 1-year commitment: ~30% savings
  • 3-year commitment: ~50% savings

Spot Instances

Use for non-critical view syncers:
CapacityProviderStrategy:
  - CapacityProvider: FARGATE_SPOT
    Weight: 2
  - CapacityProvider: FARGATE
    Weight: 1

Troubleshooting

High Memory Usage

# Reduce workers
ZERO_NUM_SYNC_WORKERS=4

# Lower back pressure limit
ZERO_CHANGE_STREAMER_BACK_PRESSURE_LIMIT_HEAP_PROPORTION=0.03

Connection Pool Exhaustion

# Increase limits
ZERO_UPSTREAM_MAX_CONNS=30
ZERO_CVR_MAX_CONNS=200

Slow Replication

Check:
  • PostgreSQL CPU usage
  • Network latency
  • Replication slot lag
SELECT * FROM pg_replication_slots;

Next Steps

Build docs developers (and LLMs) love