The archiver is a standalone service that archives old check results to S3 as Parquet files with day-based partitioning, or to local filesystem storage.
Overview
The archiver:
- Archives check results older than a configured retention period
- Exports data as Parquet files for efficient storage and querying
- Uses day-based partitioning (
year=YYYY/month=MM/day=DD/)
- Runs on a configurable cron schedule
- Supports both S3 and local filesystem storage
- Processes data in configurable batch sizes
Running the Archiver
Standalone Process
Start the archiver in a separate terminal:
The archiver will:
- Connect to your database (using
DATABASE_URL)
- Start the HTTP API server (default port 3002)
- Run archival jobs according to
ARCHIVAL_CRON schedule
Docker
The Docker entrypoint can auto-start the archiver:
ARCHIVAL_ENABLED=true docker run pongo
Or in your docker-compose.yml:
services:
archiver:
build: .
command: ["bun", "archiver"]
environment:
DATABASE_URL: postgres://postgres:password@db:5432/pongo
ARCHIVAL_ENABLED: "true"
ARCHIVAL_RETENTION_DAYS: "90"
S3_BUCKET: pongo-archives
S3_REGION: us-east-1
S3_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
S3_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
depends_on: [db]
Fly.io
The included fly.toml handles archiver deployment:
fly secrets set ARCHIVAL_ENABLED=true
fly secrets set S3_BUCKET=pongo-archives
fly secrets set S3_REGION=us-east-1
fly secrets set S3_ACCESS_KEY_ID=...
fly secrets set S3_SECRET_ACCESS_KEY=...
fly deploy
Environment Variables
Core Configuration
ARCHIVAL_ENABLED
boolean
default:"false"
required
Enable automatic data archival. Must be true for archiver to run.
Number of days before check results are archived. Results older than this are moved to archive storage.ARCHIVAL_RETENTION_DAYS=90 # Archive after 90 days
HTTP API server port for health checks and status.
Schedule Configuration
ARCHIVAL_CRON
string
default:"0 3 * * *"
Cron expression for archival schedule. Default runs daily at 3 AM.Examples:ARCHIVAL_CRON="0 3 * * *" # Daily at 3 AM
ARCHIVAL_CRON="0 */6 * * *" # Every 6 hours
ARCHIVAL_CRON="0 2 * * 0" # Weekly on Sunday at 2 AM
ARCHIVAL_CRON="0 4 1 * *" # Monthly on 1st at 4 AM
Number of rows to process per batch. Larger batches are more efficient but use more memory.ARCHIVAL_BATCH_SIZE=50000 # Process 50k rows per batch
S3 Configuration
Configure S3 for cloud-based archival storage.
S3 bucket name for storing archived data.
AWS region where the S3 bucket is located.
AWS access key ID for S3 authentication.S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS secret access key for S3 authentication.S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Prefix for S3 object keys. Useful for organizing archives in a shared bucket.S3_PREFIX=production/archives
# Results in: s3://pongo-archives/production/archives/year=2025/month=12/...
S3 Bucket Setup
Create an S3 bucket for archives:
aws s3 mb s3://pongo-archives --region us-east-1
Create an IAM policy with required permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::pongo-archives",
"arn:aws:s3:::pongo-archives/*"
]
}
]
}
Create IAM user and attach policy:
aws iam create-user --user-name pongo-archiver
aws iam put-user-policy --user-name pongo-archiver --policy-name PongoArchiveAccess --policy-document file://iam-policy.json
aws iam create-access-key --user-name pongo-archiver
Local Archival
Archive to local filesystem instead of S3.
ARCHIVAL_LOCAL_PATH
string
default:"./archives"
Local filesystem path for archived data. Used when S3 is not configured.ARCHIVAL_LOCAL_PATH=/data/pongo/archives
Directory Structure
Local archives use the same day-based partitioning as S3:
archives/
└── year=2025/
└── month=12/
├── day=01/
│ └── checks_20251201_batch001.parquet
├── day=02/
│ └── checks_20251202_batch001.parquet
└── day=03/
├── checks_20251203_batch001.parquet
└── checks_20251203_batch002.parquet
Docker Volume Mount
Mount a volume for persistent local archives:
services:
archiver:
build: .
command: ["bun", "archiver"]
environment:
ARCHIVAL_ENABLED: "true"
ARCHIVAL_LOCAL_PATH: /data/archives
volumes:
- archive-data:/data/archives
volumes:
archive-data:
Complete Configuration Examples
S3 Archival (Production)
# Database
DATABASE_URL=postgres://user:[email protected]:5432/pongo
# Archival
ARCHIVAL_ENABLED=true
ARCHIVAL_RETENTION_DAYS=90
ARCHIVAL_CRON="0 3 * * *"
ARCHIVAL_BATCH_SIZE=50000
# S3
S3_BUCKET=pongo-archives
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
S3_PREFIX=production/archives
# API
ARCHIVER_PORT=3002
Local Archival (Development)
# Database
DATABASE_URL=file:./pongo/pongo.db
# Archival
ARCHIVAL_ENABLED=true
ARCHIVAL_RETENTION_DAYS=30
ARCHIVAL_CRON="0 2 * * *"
ARCHIVAL_BATCH_SIZE=10000
ARCHIVAL_LOCAL_PATH=./archives
# API
ARCHIVER_PORT=3002
Docker Compose (Full Stack)
services:
pongo:
build: .
ports: ["3000:3000"]
environment:
DATABASE_URL: postgres://postgres:password@db:5432/pongo
ACCESS_CODE: your-password
depends_on: [db]
scheduler:
build: .
command: ["bun", "scheduler"]
environment:
DATABASE_URL: postgres://postgres:password@db:5432/pongo
SCHEDULER_PORT: "3001"
depends_on: [db]
archiver:
build: .
command: ["bun", "archiver"]
environment:
DATABASE_URL: postgres://postgres:password@db:5432/pongo
ARCHIVAL_ENABLED: "true"
ARCHIVAL_RETENTION_DAYS: "90"
ARCHIVAL_CRON: "0 3 * * *"
ARCHIVAL_BATCH_SIZE: "50000"
S3_BUCKET: pongo-archives
S3_REGION: us-east-1
S3_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
S3_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
ARCHIVER_PORT: "3002"
depends_on: [db]
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: password
POSTGRES_DB: pongo
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
Archived data is stored as Parquet files for efficient storage and querying.
Schema
Archived check results include:
id - Check result ID
monitor_id - Monitor identifier
status - Check status (up/down/degraded)
response_time_ms - Response time in milliseconds
status_code - HTTP status code (if applicable)
message - Additional message or error details
checked_at - Timestamp of the check
region - Region where check was performed
Querying Archives
Use tools like DuckDB, AWS Athena, or Apache Spark to query archived Parquet files:
-- Query local archives
SELECT
monitor_id,
AVG(response_time_ms) as avg_response_time,
COUNT(*) as total_checks,
SUM(CASE WHEN status = 'down' THEN 1 ELSE 0 END) as failures
FROM read_parquet('archives/year=2025/month=12/*/*.parquet')
WHERE checked_at >= '2025-12-01'
GROUP BY monitor_id;
-- Query S3 archives
CREATE EXTERNAL TABLE pongo_archives (
id VARCHAR,
monitor_id VARCHAR,
status VARCHAR,
response_time_ms INT,
checked_at TIMESTAMP
)
PARTITIONED BY (year INT, month INT, day INT)
STORED AS PARQUET
LOCATION 's3://pongo-archives/production/archives/';
-- Load partitions
MSCK REPAIR TABLE pongo_archives;
-- Query
SELECT monitor_id, AVG(response_time_ms)
FROM pongo_archives
WHERE year=2025 AND month=12
GROUP BY monitor_id;
Monitoring the Archiver
Health Check
Check archiver status:
curl http://localhost:3002/health
{
"status": "healthy",
"lastRun": "2025-12-08T03:00:00Z",
"nextRun": "2025-12-09T03:00:00Z",
"archivedRecords": 1500000,
"storage": "s3"
}
Docker Health Check
archiver:
# ...
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3002/health"]
interval: 60s
timeout: 10s
retries: 3
Troubleshooting
Archiver Won’t Start
- Enable Archival: Ensure
ARCHIVAL_ENABLED=true
- Database Connection: Verify
DATABASE_URL is correct
- S3 Credentials: Check S3 environment variables if using S3
S3 Upload Failures
- Credentials: Verify
S3_ACCESS_KEY_ID and S3_SECRET_ACCESS_KEY
- Bucket Permissions: Ensure IAM user has
s3:PutObject permission
- Bucket Exists: Verify bucket exists in the specified region
- Network Access: Check firewall rules and network connectivity
High Memory Usage
- Reduce Batch Size: Lower
ARCHIVAL_BATCH_SIZE
- Adjust Schedule: Run archival more frequently with smaller batches
- Monitor Database: Check database memory usage during archival
Slow Archival
- Increase Batch Size: Raise
ARCHIVAL_BATCH_SIZE if memory allows
- Database Performance: Optimize database indexes and query performance
- S3 Region: Use S3 region close to your archiver instance
- Network Bandwidth: Check network throughput to S3
Data Loss Prevention: Always verify your first archival run completed successfully before relying on automatic archival. Check both the database and archive storage to ensure data was correctly transferred.
Storage Costs: Parquet files are highly compressed, typically achieving 10-20x compression compared to raw database storage. Monitor your S3 storage costs and adjust retention periods as needed.
Testing: Test archival with a short retention period (e.g., 7 days) first to verify your configuration works correctly before setting longer retention periods.