Overview
Database backups are automated using a custom Docker container that dumps all PostgreSQL databases to S3. The backup system runs as a Kubernetes CronJob and uses AWS IAM roles for secure access to S3 storage.
Backup Schedule
Backups run automatically via CronJob:
This translates to 2:21 AM UTC daily.
Backup Process
Architecture
The backup system consists of:
- Docker Image:
pennlabs/pg-s3-backup
- Storage: S3 bucket at
s3://sql.pennlabs.org
- IAM Role:
db-backup with S3 permissions
- CronJob: Kubernetes job scheduled daily
How It Works
The backup script (/home/daytona/workspace/source/docker/pg-s3-backup/backup:1-41):
- Lists all databases in the PostgreSQL cluster
- Excludes system databases (postgres, rdsadmin, template0, template1)
- Runs
pg_dump on each database in parallel
- Compresses dumps using gzip
- Uploads to S3 with timestamp-based organization
Backup Structure
Backups are organized in S3 by timestamp:
s3://sql.pennlabs.org/
YYYYMMDD - DayName DD Month YYYY @ HH:MM/
YYYYMMDD - DayName DD Month YYYY @ HH:MM - database1.sql.gz
YYYYMMDD - DayName DD Month YYYY @ HH:MM - database2.sql.gz
Configuration
Required Environment Variables
The backup container requires:
AWS_DEFAULT_REGION: Set to us-east-1
DATABASE_URL: PostgreSQL connection string
S3_BUCKET: Target S3 bucket name
IAM credentials are provided via IRSA (IAM Roles for Service Accounts).
IAM Permissions
The db-backup role has the following S3 permissions (/home/daytona/workspace/source/terraform/db-backup.tf:1-16):
actions = ["s3:PutObject", "s3:Get*"]
resources = [
"arn:aws:s3:::sql.pennlabs.org/*",
"arn:aws:s3:::sql.pennlabs.org"
]
Kubernetes Configuration
Deployed via Helm with configuration in /home/daytona/workspace/source/terraform/helm/db-backup.yaml:1-14:
cronjobs:
- name: db-backup
schedule: "21 2 * * *"
secret: db-backup
image: pennlabs/pg-s3-backup
tag: 18806efdf96777fce2341b8eb81c95bf1a7d6897
extraEnv:
- name: AWS_DEFAULT_REGION
value: "us-east-1"
rbac:
createSA: true
roleARN: ${roleARN}
Restore Procedures
Prerequisites
- AWS credentials with S3 read access
- PostgreSQL client tools installed
- Network access to target database
Restore Steps
1. List Available Backups
aws s3 ls s3://sql.pennlabs.org/ --recursive
2. Download Backup
aws s3 cp "s3://sql.pennlabs.org/YYYYMMDD - Day DD Month YYYY @ HH:MM/YYYYMMDD - Day DD Month YYYY @ HH:MM - dbname.sql.gz" - | gunzip > restore.sql
3. Restore Database
Restoring will overwrite existing data. Always verify the target database before proceeding.
# Create new database if needed
psql $DATABASE_URL/postgres -c "CREATE DATABASE dbname;"
# Restore from dump
psql $DATABASE_URL/dbname < restore.sql
4. Verify Restoration
psql $DATABASE_URL/dbname -c "SELECT count(*) FROM pg_tables WHERE schemaname = 'public';"
Monitoring
Check Backup Status
View recent CronJob executions:
kubectl get cronjobs
kubectl get jobs --selector=cronjob=db-backup
View Backup Logs
# Get most recent job
JOB=$(kubectl get jobs --selector=cronjob=db-backup --sort-by=.metadata.creationTimestamp -o jsonpath='{.items[-1].metadata.name}')
# View logs
kubectl logs job/$JOB
Successful backups will show:
Starting backup
Creating dump of database1
Uploading dump of database1
...
Backup finished successfully
Total data backed up: XXX MB
Troubleshooting
Backup Job Fails
Check pod logs:
kubectl logs -l cronjob=db-backup --tail=100
Common issues:
- IAM role permissions incorrect
- Database connection timeout
- S3 bucket access denied
- Insufficient disk space on pod
Restore Fails
Error: “role does not exist”
# Create required roles before restoring
psql $DATABASE_URL/dbname -c "CREATE ROLE rolename;"
Error: “database already exists”
# Drop and recreate database
psql $DATABASE_URL/postgres -c "DROP DATABASE dbname;"
psql $DATABASE_URL/postgres -c "CREATE DATABASE dbname;"
Best Practices
- Test restores regularly - Verify backups are usable
- Monitor backup size - Unexpected size changes may indicate issues
- Set S3 lifecycle policies - Archive or delete old backups
- Encrypt sensitive backups - Use S3 encryption for compliance
- Document restore procedures - Keep runbooks updated for incident response