Archiver Setup

The archiver is a standalone service that archives old check results to S3 as Parquet files with day-based partitioning, or to local filesystem storage.

Overview

The archiver:

Archives check results older than a configured retention period
Exports data as Parquet files for efficient storage and querying
Uses day-based partitioning (year=YYYY/month=MM/day=DD/)
Runs on a configurable cron schedule
Supports both S3 and local filesystem storage
Processes data in configurable batch sizes

Running the Archiver

Standalone Process

Start the archiver in a separate terminal:

bun archiver

The archiver will:

Connect to your database (using DATABASE_URL)
Start the HTTP API server (default port 3002)
Run archival jobs according to ARCHIVAL_CRON schedule

Docker

The Docker entrypoint can auto-start the archiver:

ARCHIVAL_ENABLED=true docker run pongo

Or in your docker-compose.yml:

docker-compose.yml

services:
  archiver:
    build: .
    command: ["bun", "archiver"]
    environment:
      DATABASE_URL: postgres://postgres:password@db:5432/pongo
      ARCHIVAL_ENABLED: "true"
      ARCHIVAL_RETENTION_DAYS: "90"
      S3_BUCKET: pongo-archives
      S3_REGION: us-east-1
      S3_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
      S3_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
    depends_on: [db]

Fly.io

The included fly.toml handles archiver deployment:

fly secrets set ARCHIVAL_ENABLED=true
fly secrets set S3_BUCKET=pongo-archives
fly secrets set S3_REGION=us-east-1
fly secrets set S3_ACCESS_KEY_ID=...
fly secrets set S3_SECRET_ACCESS_KEY=...
fly deploy

Environment Variables

Core Configuration

ARCHIVAL_ENABLED

boolean

default:"false"

required

Enable automatic data archival. Must be true for archiver to run.

ARCHIVAL_ENABLED=true

ARCHIVAL_RETENTION_DAYS

number

default:"30"

Number of days before check results are archived. Results older than this are moved to archive storage.

ARCHIVAL_RETENTION_DAYS=90  # Archive after 90 days

ARCHIVER_PORT

number

default:"3002"

HTTP API server port for health checks and status.

ARCHIVER_PORT=3002

Schedule Configuration

ARCHIVAL_CRON

string

default:"0 3 * * *"

Cron expression for archival schedule. Default runs daily at 3 AM.Examples:

ARCHIVAL_CRON="0 3 * * *"      # Daily at 3 AM
ARCHIVAL_CRON="0 */6 * * *"    # Every 6 hours
ARCHIVAL_CRON="0 2 * * 0"      # Weekly on Sunday at 2 AM
ARCHIVAL_CRON="0 4 1 * *"      # Monthly on 1st at 4 AM

ARCHIVAL_BATCH_SIZE

number

default:"10000"

Number of rows to process per batch. Larger batches are more efficient but use more memory.

ARCHIVAL_BATCH_SIZE=50000  # Process 50k rows per batch

S3 Configuration

Configure S3 for cloud-based archival storage.

S3_BUCKET

string

required

S3 bucket name for storing archived data.

S3_BUCKET=pongo-archives

S3_REGION

string

required

AWS region where the S3 bucket is located.

S3_REGION=us-east-1

S3_ACCESS_KEY_ID

string

required

AWS access key ID for S3 authentication.

S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE

S3_SECRET_ACCESS_KEY

string

required

AWS secret access key for S3 authentication.

S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

S3_PREFIX

string

Prefix for S3 object keys. Useful for organizing archives in a shared bucket.

S3_PREFIX=production/archives
# Results in: s3://pongo-archives/production/archives/year=2025/month=12/...

S3 Bucket Setup

Create an S3 bucket for archives:

aws s3 mb s3://pongo-archives --region us-east-1

Create an IAM policy with required permissions:

iam-policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::pongo-archives",
        "arn:aws:s3:::pongo-archives/*"
      ]
    }
  ]
}

Create IAM user and attach policy:

aws iam create-user --user-name pongo-archiver
aws iam put-user-policy --user-name pongo-archiver --policy-name PongoArchiveAccess --policy-document file://iam-policy.json
aws iam create-access-key --user-name pongo-archiver

Local Archival

Archive to local filesystem instead of S3.

ARCHIVAL_LOCAL_PATH

string

default:"./archives"

Local filesystem path for archived data. Used when S3 is not configured.

ARCHIVAL_LOCAL_PATH=/data/pongo/archives

Directory Structure

Local archives use the same day-based partitioning as S3:

archives/
└── year=2025/
    └── month=12/
        ├── day=01/
        │   └── checks_20251201_batch001.parquet
        ├── day=02/
        │   └── checks_20251202_batch001.parquet
        └── day=03/
            ├── checks_20251203_batch001.parquet
            └── checks_20251203_batch002.parquet

Docker Volume Mount

Mount a volume for persistent local archives:

docker-compose.yml

services:
  archiver:
    build: .
    command: ["bun", "archiver"]
    environment:
      ARCHIVAL_ENABLED: "true"
      ARCHIVAL_LOCAL_PATH: /data/archives
    volumes:
      - archive-data:/data/archives

volumes:
  archive-data:

Complete Configuration Examples

S3 Archival (Production)

.env

# Database
DATABASE_URL=postgres://user:[email protected]:5432/pongo

# Archival
ARCHIVAL_ENABLED=true
ARCHIVAL_RETENTION_DAYS=90
ARCHIVAL_CRON="0 3 * * *"
ARCHIVAL_BATCH_SIZE=50000

# S3
S3_BUCKET=pongo-archives
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
S3_PREFIX=production/archives

# API
ARCHIVER_PORT=3002

Local Archival (Development)

.env

# Database
DATABASE_URL=file:./pongo/pongo.db

# Archival
ARCHIVAL_ENABLED=true
ARCHIVAL_RETENTION_DAYS=30
ARCHIVAL_CRON="0 2 * * *"
ARCHIVAL_BATCH_SIZE=10000
ARCHIVAL_LOCAL_PATH=./archives

# API
ARCHIVER_PORT=3002

Docker Compose (Full Stack)

docker-compose.yml

services:
  pongo:
    build: .
    ports: ["3000:3000"]
    environment:
      DATABASE_URL: postgres://postgres:password@db:5432/pongo
      ACCESS_CODE: your-password
    depends_on: [db]

  scheduler:
    build: .
    command: ["bun", "scheduler"]
    environment:
      DATABASE_URL: postgres://postgres:password@db:5432/pongo
      SCHEDULER_PORT: "3001"
    depends_on: [db]

  archiver:
    build: .
    command: ["bun", "archiver"]
    environment:
      DATABASE_URL: postgres://postgres:password@db:5432/pongo
      ARCHIVAL_ENABLED: "true"
      ARCHIVAL_RETENTION_DAYS: "90"
      ARCHIVAL_CRON: "0 3 * * *"
      ARCHIVAL_BATCH_SIZE: "50000"
      S3_BUCKET: pongo-archives
      S3_REGION: us-east-1
      S3_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
      S3_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
      ARCHIVER_PORT: "3002"
    depends_on: [db]

  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: password
      POSTGRES_DB: pongo
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Parquet File Format

Archived data is stored as Parquet files for efficient storage and querying.

Schema

Archived check results include:

id - Check result ID
monitor_id - Monitor identifier
status - Check status (up/down/degraded)
response_time_ms - Response time in milliseconds
status_code - HTTP status code (if applicable)
message - Additional message or error details
checked_at - Timestamp of the check
region - Region where check was performed

Querying Archives

Use tools like DuckDB, AWS Athena, or Apache Spark to query archived Parquet files:

DuckDB Example

-- Query local archives
SELECT 
  monitor_id,
  AVG(response_time_ms) as avg_response_time,
  COUNT(*) as total_checks,
  SUM(CASE WHEN status = 'down' THEN 1 ELSE 0 END) as failures
FROM read_parquet('archives/year=2025/month=12/*/*.parquet')
WHERE checked_at >= '2025-12-01'
GROUP BY monitor_id;

AWS Athena Example

-- Query S3 archives
CREATE EXTERNAL TABLE pongo_archives (
  id VARCHAR,
  monitor_id VARCHAR,
  status VARCHAR,
  response_time_ms INT,
  checked_at TIMESTAMP
)
PARTITIONED BY (year INT, month INT, day INT)
STORED AS PARQUET
LOCATION 's3://pongo-archives/production/archives/';

-- Load partitions
MSCK REPAIR TABLE pongo_archives;

-- Query
SELECT monitor_id, AVG(response_time_ms)
FROM pongo_archives
WHERE year=2025 AND month=12
GROUP BY monitor_id;

Monitoring the Archiver

Health Check

Check archiver status:

curl http://localhost:3002/health

{
  "status": "healthy",
  "lastRun": "2025-12-08T03:00:00Z",
  "nextRun": "2025-12-09T03:00:00Z",
  "archivedRecords": 1500000,
  "storage": "s3"
}

Docker Health Check

docker-compose.yml

archiver:
  # ...
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:3002/health"]
    interval: 60s
    timeout: 10s
    retries: 3

Troubleshooting

Archiver Won’t Start

Enable Archival: Ensure ARCHIVAL_ENABLED=true
Database Connection: Verify DATABASE_URL is correct
S3 Credentials: Check S3 environment variables if using S3

S3 Upload Failures

Credentials: Verify S3_ACCESS_KEY_ID and S3_SECRET_ACCESS_KEY
Bucket Permissions: Ensure IAM user has s3:PutObject permission
Bucket Exists: Verify bucket exists in the specified region
Network Access: Check firewall rules and network connectivity

High Memory Usage

Reduce Batch Size: Lower ARCHIVAL_BATCH_SIZE
```
ARCHIVAL_BATCH_SIZE=5000
```
Adjust Schedule: Run archival more frequently with smaller batches
Monitor Database: Check database memory usage during archival

Slow Archival

Increase Batch Size: Raise ARCHIVAL_BATCH_SIZE if memory allows
Database Performance: Optimize database indexes and query performance
S3 Region: Use S3 region close to your archiver instance
Network Bandwidth: Check network throughput to S3

Data Loss Prevention: Always verify your first archival run completed successfully before relying on automatic archival. Check both the database and archive storage to ensure data was correctly transferred.

Storage Costs: Parquet files are highly compressed, typically achieving 10-20x compression compared to raw database storage. Monitor your S3 storage costs and adjust retention periods as needed.

Testing: Test archival with a short retention period (e.g., 7 days) first to verify your configuration works correctly before setting longer retention periods.

Deployment Options

Configuration

Archiver Setup

Overview

Running the Archiver

Standalone Process

Docker

Fly.io

Environment Variables

Core Configuration

Schedule Configuration

S3 Configuration

S3 Bucket Setup

Local Archival

Directory Structure

Docker Volume Mount

Complete Configuration Examples

S3 Archival (Production)

Local Archival (Development)

Docker Compose (Full Stack)

Parquet File Format

Schema

Querying Archives

Monitoring the Archiver

Health Check

Docker Health Check

Troubleshooting

Archiver Won’t Start

S3 Upload Failures

High Memory Usage

Slow Archival

Build docs developers (and LLMs) love

Deployment Options

Configuration

​Overview

​Running the Archiver

​Standalone Process

​Docker

​Fly.io

​Environment Variables

​Core Configuration

​Schedule Configuration

​S3 Configuration

​S3 Bucket Setup

​Local Archival

​Directory Structure

​Docker Volume Mount

​Complete Configuration Examples

​S3 Archival (Production)

​Local Archival (Development)

​Docker Compose (Full Stack)

​Parquet File Format

​Schema

​Querying Archives

​Monitoring the Archiver

​Health Check

​Docker Health Check

​Troubleshooting

​Archiver Won’t Start

​S3 Upload Failures

​High Memory Usage

​Slow Archival

Build docs developers (and LLMs) love

Overview

Running the Archiver

Standalone Process

Docker

Fly.io

Environment Variables

Core Configuration

Schedule Configuration

S3 Configuration

S3 Bucket Setup

Local Archival

Directory Structure

Docker Volume Mount

Complete Configuration Examples

S3 Archival (Production)

Local Archival (Development)

Docker Compose (Full Stack)

Parquet File Format

Schema

Querying Archives

Monitoring the Archiver

Health Check

Docker Health Check

Troubleshooting

Archiver Won’t Start

S3 Upload Failures

High Memory Usage

Slow Archival