Configuration Methods
CronJob Guardian supports three configuration methods with the following precedence (highest to lowest):- Command-line flags -
--log-level=debug - Environment variables -
GUARDIAN_LOG_LEVEL=debug - Configuration file -
/etc/cronjob-guardian/config.yamlor specified via--config - Defaults - Built into the application
Environment Variable Format
Environment variables use theGUARDIAN_ prefix and replace dots and hyphens with underscores:
Configuration File
The operator looks forconfig.yaml in these locations (in order):
- Path specified by
--configflag /etc/cronjob-guardian/config.yaml./config.yaml(current directory)
Example Configuration
Configuration Reference
Top-Level Options
| Option | Type | Default | Description |
|---|---|---|---|
log-level | string | info | Logging level: debug, info, warn, error |
config | string | - | Path to config file (CLI flag only) |
Scheduler Configuration
| Option | Type | Default | Description |
|---|---|---|---|
scheduler.dead-man-switch-interval | duration | 1m | How often to check dead-man’s switches |
scheduler.sla-recalculation-interval | duration | 5m | How often to recalculate SLA metrics |
scheduler.prune-interval | duration | 1h | How often to prune old execution history |
scheduler.startup-grace-period | duration | 30s | Delay after startup before sending alerts |
Storage Configuration
General Storage Options
| Option | Type | Default | Description |
|---|---|---|---|
storage.type | string | sqlite | Storage backend: sqlite, postgres, mysql |
storage.log-storage-enabled | bool | false | Store pod logs in database (opt-in) |
storage.event-storage-enabled | bool | false | Store Kubernetes events in database (opt-in) |
storage.max-log-size-kb | int | 100 | Maximum log size to store per execution (KB) |
storage.log-retention-days | int | 0 | Log retention period (0 = use history retention) |
SQLite Options
| Option | Type | Default | Description |
|---|---|---|---|
storage.sqlite.path | string | /data/guardian.db | Path to SQLite database file |
- Uses pure Go driver (no CGO required)
- WAL mode enabled automatically for better concurrency
- Requires persistent volume for data persistence
- Suitable for small to medium deployments (under 500 CronJobs)
- Not recommended for HA deployments (file-based)
PostgreSQL Options
| Option | Type | Default | Description |
|---|---|---|---|
storage.postgres.host | string | - | PostgreSQL host |
storage.postgres.port | int | 5432 | PostgreSQL port |
storage.postgres.database | string | - | Database name |
storage.postgres.username | string | - | Database username |
storage.postgres.password | string | - | Database password (use env var instead) |
storage.postgres.ssl-mode | string | require | SSL mode: disable, require, verify-ca, verify-full |
storage.postgres.pool.max-idle-conns | int | 10 | Maximum idle connections in pool |
storage.postgres.pool.max-open-conns | int | 100 | Maximum open connections in pool |
storage.postgres.pool.conn-max-lifetime | duration | 1h | Maximum connection lifetime |
storage.postgres.pool.conn-max-idle-time | duration | 10m | Maximum idle time before closing |
- Recommended for production deployments
- Supports native percentile functions for better performance
- HA-ready with connection pooling
- Use
GUARDIAN_STORAGE_POSTGRES_PASSWORDenvironment variable for password
MySQL Options
| Option | Type | Default | Description |
|---|---|---|---|
storage.mysql.host | string | - | MySQL host |
storage.mysql.port | int | 3306 | MySQL port |
storage.mysql.database | string | - | Database name |
storage.mysql.username | string | - | Database username |
storage.mysql.password | string | - | Database password (use env var instead) |
storage.mysql.pool.max-idle-conns | int | 10 | Maximum idle connections in pool |
storage.mysql.pool.max-open-conns | int | 100 | Maximum open connections in pool |
storage.mysql.pool.conn-max-lifetime | duration | 1h | Maximum connection lifetime |
storage.mysql.pool.conn-max-idle-time | duration | 10m | Maximum idle time before closing |
- Supports both MySQL and MariaDB
- HA-ready with connection pooling
- Use
GUARDIAN_STORAGE_MYSQL_PASSWORDenvironment variable for password
History Retention
| Option | Type | Default | Description |
|---|---|---|---|
history-retention.default-days | int | 30 | Default retention period in days |
history-retention.max-days | int | 90 | Maximum retention period allowed |
- Execution records older than
default-daysare automatically pruned - Logs can have separate retention via
storage.log-retention-days - Per-monitor overrides respected (up to
max-days) - Pruning runs every
scheduler.prune-interval
Rate Limits
| Option | Type | Default | Description |
|---|---|---|---|
rate-limits.max-alerts-per-minute | int | 50 | Maximum alerts per minute (all channels) |
rate-limits.burst-limit | int | 10 | Maximum burst of alerts allowed |
rate-limits.default-suppress-duplicates-for | duration | 1h | Default duplicate suppression window |
- Uses token bucket algorithm
- Applies globally across all channels
- Duplicate suppression per alert type + CronJob combination
- Per-monitor overrides available in
CronJobMonitor.spec.alerting
UI Configuration
| Option | Type | Default | Description |
|---|---|---|---|
ui.enabled | bool | true | Enable the web UI and REST API |
ui.port | int | 8080 | Port to listen on |
- Embedded React SPA (built into binary)
- RESTful API at
/api/v1/* - Swagger/OpenAPI docs at
/swagger/ - Dashboard, charts, heatmaps, execution history
- Export to CSV/JSON
Metrics Configuration
| Option | Type | Default | Description |
|---|---|---|---|
metrics.bind-address | string | :8443 | Metrics endpoint address (use 0 to disable) |
metrics.secure | bool | true | Enable HTTPS for metrics |
metrics.cert-path | string | - | TLS certificate directory |
metrics.cert-name | string | tls.crt | TLS certificate filename |
metrics.cert-key | string | tls.key | TLS key filename |
- HTTPS enabled by default
- Supports authentication via SubjectAccessReview
- Certificate rotation via cert-watcher
- See Prometheus Metrics for details
Probes Configuration
| Option | Type | Default | Description |
|---|---|---|---|
probes.bind-address | string | :8081 | Health probe bind address |
GET /healthz- Liveness probeGET /readyz- Readiness probe
Leader Election
| Option | Type | Default | Description |
|---|---|---|---|
leader-election.enabled | bool | false | Enable leader election (required for HA) |
leader-election.lease-duration | duration | 15s | How long a leader holds the lease |
leader-election.renew-deadline | duration | 10s | Leader must renew within this time |
leader-election.retry-period | duration | 2s | How often to retry lease acquisition |
- Required for running multiple replicas
- Only leader executes schedulers
- All replicas serve metrics, UI, and handle controller reconciliations
- Uses Kubernetes Lease resources for coordination
Webhook Configuration
| Option | Type | Default | Description |
|---|---|---|---|
webhook.cert-path | string | - | Webhook TLS certificate directory |
webhook.cert-name | string | tls.crt | TLS certificate filename |
webhook.cert-key | string | tls.key | TLS key filename |
webhook.enable-http2 | bool | false | Enable HTTP/2 (disabled for security) |
- HTTP/2 disabled by default due to CVE-2023-44487 (HTTP/2 Rapid Reset)
- Certificate rotation supported via cert-watcher
- Used for validating webhooks (future feature)
Helm Chart Configuration
When deploying with Helm, usevalues.yaml to configure the operator. The Helm chart automatically generates the config file and environment variables.
Example Helm Values
Database Connection Strings
The operator constructs database DSNs automatically from the configuration:SQLite
PostgreSQL
MySQL
Best Practices
Security
- Use environment variables for sensitive values (passwords, API tokens)
- Never commit passwords to version control
- Enable TLS for metrics and webhooks in production
- Use SSL/TLS for PostgreSQL/MySQL connections
Performance
- Use PostgreSQL or MySQL for large deployments (>500 CronJobs)
- Tune connection pool settings based on workload
- Adjust retention periods to balance history vs storage
- Disable log/event storage unless needed (increases DB size)
High Availability
- Enable leader election with 3+ replicas
- Use external database (PostgreSQL/MySQL) for shared state
- Configure appropriate resource limits
- Use Pod Disruption Budgets (PDB) for planned disruptions
Alerting
- Start with conservative rate limits
- Use
startup-grace-periodto avoid restart alert floods - Configure duplicate suppression per use case
- Test alert channels before production deployment