Overview
The archiver service automatically moves old check results from your database to S3 storage as compressed Parquet files. This keeps your database lean while preserving historical data for long-term analysis.How It Works
- Scheduled Job: Runs on a cron schedule (default: 3am daily)
- Batch Processing: Archives check results older than the retention period in configurable batches
- Parquet Export: Converts rows to columnar Parquet format for efficient storage and querying
- S3 Upload: Uploads partitioned files to S3 with day-based folder structure
- Database Cleanup: Removes successfully archived rows from the database
Partitioning Structure
Check results are organized by date for efficient querying:- Partition pruning: Query engines can skip irrelevant date ranges
- Cost optimization: Only scan the data you need
- Easy deletion: Remove entire date partitions when no longer needed
Archiver Service
The archiver runs as a standalone process:Docker Integration
Enable archiving in Docker deployments with:ARCHIVAL_ENABLED=true.
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
ARCHIVAL_ENABLED | Enable archival service | false |
ARCHIVAL_RETENTION_DAYS | Days to keep in database before archiving | 30 |
ARCHIVAL_CRON | Cron schedule for archival job | 0 3 * * * (3am daily) |
ARCHIVAL_BATCH_SIZE | Rows to archive per batch | 10000 |
ARCHIVAL_LOCAL_PATH | Temporary local storage for Parquet files | ./archives |
ARCHIVER_PORT | HTTP API port | 3002 |
S3 Configuration
| Variable | Description | Example |
|---|---|---|
S3_BUCKET | S3 bucket name | my-pongo-archives |
S3_REGION | AWS region | us-east-1 |
S3_ACCESS_KEY_ID | AWS access key ID | AKIAIOSFODNN7EXAMPLE |
S3_SECRET_ACCESS_KEY | AWS secret access key | wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY |
S3_PREFIX | S3 key prefix (optional) | pongo/archives |
Example Configuration
Archival Process
1. Select Eligible Rows
The archiver queries for check results where:checkedAt < (now - ARCHIVAL_RETENTION_DAYS)archivedAt IS NULL(not previously archived)
2. Mark as Archiving
Rows are immediately marked witharchivedAt = now() to prevent duplicate archival in concurrent runs.
3. Partition by Day
Rows are grouped by the date of theircheckedAt timestamp:
4. Write Parquet Files
Each partition is written to a local Parquet file:5. Upload to S3
Parquet files are uploaded to S3 with the same partition structure:6. Cleanup
- Successful rows: Deleted from the database
- Failed rows:
archivedAtis reset toNULLfor retry on the next run - Local files: Deleted after successful S3 upload
Parquet Schema
Archived check results include:| Column | Type | Description |
|---|---|---|
id | string | Unique check result ID |
monitorId | string | Monitor identifier |
status | string | "up", "down", or "degraded" |
responseTimeMs | number | Response time in milliseconds |
statusCode | number | HTTP status code (if applicable) |
message | string | Additional context or error message |
checkedAt | timestamp | When the check was performed |
createdAt | timestamp | When the record was created |
Querying Archived Data
Use AWS Athena, DuckDB, or any Parquet-compatible tool:AWS Athena Example
DuckDB Example
Monitoring Archival
The archiver exposes an HTTP API on port 3002 (configurable viaARCHIVER_PORT):
| Endpoint | Method | Description |
|---|---|---|
/health | GET | Health check |
/trigger | POST | Manually trigger archival |
Manual Trigger
Best Practices
Tune Retention Period
Balance database size with query performance needs. 30 days works for most use cases.
Monitor Batch Size
Larger batches reduce overhead but increase memory usage. Start with 10,000.
Schedule Off-Peak
Run archival during low-traffic periods to minimize database impact.
Set S3 Lifecycle Rules
Configure S3 lifecycle policies to automatically delete old archives or transition to Glacier.
Troubleshooting
Archival Not Running
- Verify
ARCHIVAL_ENABLED=true - Check all S3 environment variables are set
- Review archiver logs for errors
Large Database Despite Archival
- Check
ARCHIVAL_RETENTION_DAYSis appropriate - Verify archival cron schedule is running
- Manually trigger archival to test:
curl -X POST http://localhost:3002/trigger
S3 Upload Failures
- Verify IAM credentials have
s3:PutObjectpermission - Check bucket name and region are correct
- Ensure bucket exists and is accessible
Related
Deployment
Deploy the archiver service in production
Environment Variables
Complete list of configuration options
Database
Learn about database schema and management