Overview
CronJob Guardian uses a pluggable storage backend to persist execution history, alert records, and channel statistics. The storage layer is abstracted through an interface, allowing support for multiple database engines.Supported Databases
Three database engines are supported:SQLite (Default)
Use case: Development, small deployments, single-namespace monitoring Characteristics:- Pure Go implementation (no CGO required)
- File-based storage requiring persistent volume
- WAL (Write-Ahead Logging) mode enabled automatically
- Busy timeout set to 5000ms for better concurrency
- Suitable for under 500 CronJobs
- Not recommended for HA deployments
/data/guardian.db?_journal_mode=WAL&_busy_timeout=5000
Helm chart persistence:
PostgreSQL
Use case: Production deployments, HA configurations, large scale (>500 CronJobs) Characteristics:- Full ACID compliance
- Native percentile functions (
PERCENTILE_CONT) for O(1) memory usage - Connection pooling support
- SSL/TLS support
- HA-ready with external database
- Recommended for production
MySQL / MariaDB
Use case: Production deployments, existing MySQL infrastructure Characteristics:- Full ACID compliance
- Connection pooling support
- Compatible with MySQL 5.7+ and MariaDB 10.3+
- HA-ready with external database
guardian:password@tcp(mysql.default.svc.cluster.local:3306)/guardian?parseTime=true
Database Schema
The schema is automatically created and migrated on startup using GORM auto-migration.Tables
executions
Stores job execution history. Key columns:id(bigint, primary key) - Auto-increment IDcronjob_ns(varchar 253) - CronJob namespacecronjob_name(varchar 253) - CronJob namecronjob_uid(varchar 36) - CronJob UID for recreation detectionjob_name(varchar 253) - Kubernetes Job namescheduled_time(timestamp, nullable) - Scheduled start time from CronJobstart_time(timestamp) - Actual start timecompletion_time(timestamp) - Completion timeduration_secs(float, nullable) - Duration in secondssucceeded(boolean) - Success statusexit_code(int) - Container exit codereason(varchar 255) - Failure reasonis_retry(boolean) - Whether this is a retryretry_of(varchar 253) - Original job name if retrylogs(text, nullable) - Pod logs (if enabled)events(text, nullable) - Kubernetes events (if enabled)suggested_fix(text) - AI-generated fix suggestioncreated_at(timestamp) - Record creation time
idx_cronjob_time(cronjob_ns, cronjob_name, start_time DESC)idx_cronjob_uid(cronjob_ns, cronjob_name, cronjob_uid)idx_cronjob_duration(cronjob_ns, cronjob_name, start_time, duration_secs)idx_job_name(job_name)idx_start_time(start_time)
alert_history
Stores alert events and resolutions. Key columns:id(bigint, primary key) - Auto-increment IDalert_type(varchar 100) - Alert type (JobFailed, SLABreached, DeadManSwitch, etc.)severity(varchar 20) - Severity level (critical, warning, info)title(varchar 500) - Alert titlemessage(text) - Alert message bodycronjob_ns(varchar 253) - CronJob namespacecronjob_name(varchar 253) - CronJob namemonitor_ns(varchar 253) - Monitor namespacemonitor_name(varchar 253) - Monitor namechannels_notified(text) - Comma-separated channel namesoccurred_at(timestamp) - Alert occurrence timeresolved_at(timestamp, nullable) - Alert resolution timeexit_code(int) - Exit code for failure alertsreason(varchar 255) - Failure reasonsuggested_fix(text) - Fix suggestion
idx_alert_cronjob(cronjob_ns, cronjob_name)idx_alert_cronjob_time(cronjob_ns, cronjob_name, occurred_at DESC)idx_alert_occurred(occurred_at DESC)idx_alert_severity(severity)idx_alert_resolve(alert_type, cronjob_ns, cronjob_name, resolved_at)idx_alert_unresolved(resolved_at) - For filtering unresolved alerts
channel_stats
Stores per-channel alert statistics. Key columns:id(bigint, primary key) - Auto-increment IDchannel_name(varchar 253, unique) - Channel namealerts_sent_total(bigint) - Total alerts sent successfullyalerts_failed_total(bigint) - Total failed alert attemptslast_alert_time(timestamp, nullable) - Last successful alertlast_failed_time(timestamp, nullable) - Last failed alertlast_failed_error(text) - Last failure error messageconsecutive_failures(int) - Consecutive failures counterupdated_at(timestamp) - Last update time
- Unique index on
channel_name
Connection Pooling
Connection pooling is supported for PostgreSQL and MySQL to optimize database resource usage.Configuration Options
| Option | Default | Description |
|---|---|---|
max-idle-conns | 10 | Maximum idle connections in pool |
max-open-conns | 100 | Maximum open connections |
conn-max-lifetime | 1h | Maximum connection lifetime |
conn-max-idle-time | 10m | Maximum idle time before closing |
Tuning Guidelines
Small deployments (under 100 CronJobs):- Multiply
max-open-connsby number of replicas - Ensure database server
max_connections> total connections from all replicas - Example: 3 replicas × 100 conns = 300, so database needs 300+ max_connections
Retention Policies
Execution History Retention
Execution records are automatically pruned based on age. Configuration:- History pruner runs every
prune-interval(default: 1 hour) - Deletes execution records older than
default-days - Per-monitor overrides respected (via
CronJobMonitor.spec.dataRetention) - Maximum retention capped at
max-dayseven with per-monitor overrides
Log Retention
Logs can have separate retention from execution metadata. Configuration:- If
log-retention-days: 0, logs are kept for same duration as executions - If
log-retention-days > 0, logs are pruned earlier than execution records - Pruning only removes
logsandeventscolumns, execution metadata remains - Useful for long execution history without excessive storage
Log and Event Storage
By default, logs and events are not stored in the database. Enable opt-in:Configuration
Storage Impact
Enabling log storage significantly increases database size: Without logs:- ~1 KB per execution record
- 1000 executions = ~1 MB
- ~100 KB per execution record
- 1000 executions = ~100 MB
log-retention-days than execution history.
Database Setup
PostgreSQL Setup
1. Create Database and User
2. Configure Operator
3. SSL/TLS Setup (Recommended)
For production, use SSL certificates:MySQL Setup
1. Create Database and User
2. Configure Operator
SQLite Setup
No manual setup required. Database file is created automatically. Ensure persistent volume is configured:Backup and Recovery
SQLite Backup
Option 1: Volume snapshot Use Kubernetes VolumeSnapshot resources:PostgreSQL Backup
Option 1: pg_dumpMySQL Backup
Option 1: mysqldumpMigration Between Database Engines
Migrating from one database engine to another requires export and import.Export Data
Using REST API (recommended):Import Data
Manual import (SQL):- Start new operator with target database
- Let auto-migration create schema
- Export data from old database (CSV/JSON)
- Import into new database using database tools
Troubleshooting
Connection Issues
PostgreSQL connection refused:Performance Issues
Slow queries:- Reduce
storage.max-log-size-kb - Lower retention periods
- Increase pruning frequency
- Tune connection pool settings
Storage Full
SQLite disk full:Best Practices
Production Deployments
- Use PostgreSQL or MySQL - Better performance and HA support
- Enable SSL/TLS - Encrypt database connections
- Use connection pooling - Tune for your workload
- Regular backups - Automate database backups
- Monitor storage - Alert on disk usage
- Separate log retention - Keep logs shorter than executions
- External database - Don’t run database in same cluster
High Availability
- External database cluster - PostgreSQL HA (Patroni, Stolon) or MySQL Galera
- Multiple operator replicas - Enable leader election
- Database connection limits - Set appropriate
max-open-conns - Health monitoring - Monitor database health separately
Storage Optimization
- Disable log storage - Unless required for auditing
- Aggressive log retention - 7 days or less
- Reasonable execution retention - 30-90 days
- Regular pruning - Run every hour
- Database maintenance - Regular VACUUM/OPTIMIZE
Security
- Strong passwords - Use secret management
- Least privilege - Grant minimum database permissions
- Network policies - Restrict database access
- SSL/TLS - Encrypt all connections
- Audit logging - Enable database audit logs