Skip to main content

Overview

The CronJob Guardian operator can be configured via:
  1. Config file (YAML) - /etc/cronjob-guardian/config.yaml or via --config flag
  2. Environment variables - Prefix with GUARDIAN_ and replace dots/hyphens with underscores
  3. Command-line flags - Direct flags to the operator binary
Precedence (highest to lowest): CLI flags → Environment variables → Config file → Defaults

Configuration File Format

# config.yaml
log-level: info

scheduler:
  dead-man-switch-interval: 1m
  sla-recalculation-interval: 5m
  prune-interval: 1h
  startup-grace-period: 30s

storage:
  type: sqlite
  sqlite:
    path: /data/guardian.db

history-retention:
  default-days: 30
  max-days: 90

rate-limits:
  max-alerts-per-minute: 50
  burst-limit: 10
  default-suppress-duplicates-for: 1h

ui:
  enabled: true
  port: 8080

metrics:
  bind-address: ":8443"
  secure: true

probes:
  bind-address: ":8081"

leader-election:
  enabled: false
  lease-duration: 15s
  renew-deadline: 10s
  retry-period: 2s

webhook:
  enable-http2: false

Configuration Options

log-level

log-level
string
default:"info"
Logging level for operator output.Valid values: debug, info, warn, errorEnvironment variable: GUARDIAN_LOG_LEVELCLI flag: --log-level

Scheduler Configuration

Controls background task execution intervals.
scheduler.dead-man-switch-interval
duration
default:"1m"
How often to check dead-man’s switches across all monitors.Environment variable: GUARDIAN_SCHEDULER_DEAD_MAN_SWITCH_INTERVALCLI flag: --scheduler.dead-man-switch-interval
scheduler.sla-recalculation-interval
duration
default:"5m"
How often to recalculate SLA metrics (success rates, duration percentiles).Environment variable: GUARDIAN_SCHEDULER_SLA_RECALCULATION_INTERVALCLI flag: --scheduler.sla-recalculation-interval
scheduler.prune-interval
duration
default:"1h"
How often to prune old execution history based on retention policies.Environment variable: GUARDIAN_SCHEDULER_PRUNE_INTERVALCLI flag: --scheduler.prune-interval
scheduler.startup-grace-period
duration
default:"30s"
Delay after operator startup before sending alerts. Prevents alert floods when the operator restarts and reconciles existing resources.Environment variable: GUARDIAN_SCHEDULER_STARTUP_GRACE_PERIODCLI flag: --scheduler.startup-grace-period

Storage Configuration

Configures the database backend for execution history and metrics.
storage.type
string
default:"sqlite"
Storage backend type.Valid values: sqlite, postgres, mysqlEnvironment variable: GUARDIAN_STORAGE_TYPECLI flag: --storage.type

SQLite Configuration

Used when storage.type is sqlite.
storage.sqlite.path
string
default:"/data/guardian.db"
Path to the SQLite database file. Requires a persistent volume in Kubernetes.Environment variable: GUARDIAN_STORAGE_SQLITE_PATHCLI flag: --storage.sqlite.path

PostgreSQL Configuration

Used when storage.type is postgres.
storage.postgres.host
string
PostgreSQL server hostname.Environment variable: GUARDIAN_STORAGE_POSTGRES_HOSTCLI flag: --storage.postgres.host
storage.postgres.port
integer
default:"5432"
PostgreSQL server port.Environment variable: GUARDIAN_STORAGE_POSTGRES_PORTCLI flag: --storage.postgres.port
storage.postgres.database
string
PostgreSQL database name.Environment variable: GUARDIAN_STORAGE_POSTGRES_DATABASECLI flag: --storage.postgres.database
storage.postgres.username
string
PostgreSQL username.Environment variable: GUARDIAN_STORAGE_POSTGRES_USERNAMECLI flag: --storage.postgres.username
storage.postgres.password
string
PostgreSQL password. Recommend using environment variable instead of config file.Environment variable: GUARDIAN_STORAGE_POSTGRES_PASSWORDCLI flag: --storage.postgres.password
storage.postgres.ssl-mode
string
default:"require"
PostgreSQL SSL mode.Valid values: disable, require, verify-ca, verify-fullEnvironment variable: GUARDIAN_STORAGE_POSTGRES_SSL_MODECLI flag: --storage.postgres.ssl-mode
storage.postgres.pool.max-idle-conns
integer
default:"10"
Maximum number of idle connections in the pool.Environment variable: GUARDIAN_STORAGE_POSTGRES_POOL_MAX_IDLE_CONNSCLI flag: --storage.postgres.pool.max-idle-conns
storage.postgres.pool.max-open-conns
integer
default:"100"
Maximum number of open connections.Environment variable: GUARDIAN_STORAGE_POSTGRES_POOL_MAX_OPEN_CONNSCLI flag: --storage.postgres.pool.max-open-conns
storage.postgres.pool.conn-max-lifetime
duration
default:"1h"
Maximum lifetime of a connection.Environment variable: GUARDIAN_STORAGE_POSTGRES_POOL_CONN_MAX_LIFETIMECLI flag: --storage.postgres.pool.conn-max-lifetime
storage.postgres.pool.conn-max-idle-time
duration
default:"10m"
Maximum idle time for a connection before it’s closed.Environment variable: GUARDIAN_STORAGE_POSTGRES_POOL_CONN_MAX_IDLE_TIMECLI flag: --storage.postgres.pool.conn-max-idle-time

MySQL Configuration

Used when storage.type is mysql.
storage.mysql.host
string
MySQL server hostname.Environment variable: GUARDIAN_STORAGE_MYSQL_HOSTCLI flag: --storage.mysql.host
storage.mysql.port
integer
default:"3306"
MySQL server port.Environment variable: GUARDIAN_STORAGE_MYSQL_PORTCLI flag: --storage.mysql.port
storage.mysql.database
string
MySQL database name.Environment variable: GUARDIAN_STORAGE_MYSQL_DATABASECLI flag: --storage.mysql.database
storage.mysql.username
string
MySQL username.Environment variable: GUARDIAN_STORAGE_MYSQL_USERNAMECLI flag: --storage.mysql.username
storage.mysql.password
string
MySQL password. Recommend using environment variable instead of config file.Environment variable: GUARDIAN_STORAGE_MYSQL_PASSWORDCLI flag: --storage.mysql.password
storage.mysql.pool.max-idle-conns
integer
default:"10"
Maximum number of idle connections in the pool.Environment variable: GUARDIAN_STORAGE_MYSQL_POOL_MAX_IDLE_CONNSCLI flag: --storage.mysql.pool.max-idle-conns
storage.mysql.pool.max-open-conns
integer
default:"100"
Maximum number of open connections.Environment variable: GUARDIAN_STORAGE_MYSQL_POOL_MAX_OPEN_CONNSCLI flag: --storage.mysql.pool.max-open-conns
storage.mysql.pool.conn-max-lifetime
duration
default:"1h"
Maximum lifetime of a connection.Environment variable: GUARDIAN_STORAGE_MYSQL_POOL_CONN_MAX_LIFETIMECLI flag: --storage.mysql.pool.conn-max-lifetime
storage.mysql.pool.conn-max-idle-time
duration
default:"10m"
Maximum idle time for a connection before it’s closed.Environment variable: GUARDIAN_STORAGE_MYSQL_POOL_CONN_MAX_IDLE_TIMECLI flag: --storage.mysql.pool.conn-max-idle-time

Storage Features

Cluster-wide defaults for log and event storage. Can be overridden per-monitor.
storage.log-storage-enabled
boolean
default:"false"
Cluster-wide default for storing job logs in the database. Individual monitors can override this via dataRetention.storeLogs.Environment variable: GUARDIAN_STORAGE_LOG_STORAGE_ENABLEDCLI flag: --storage.log-storage-enabled
storage.event-storage-enabled
boolean
default:"false"
Cluster-wide default for storing Kubernetes events in the database. Individual monitors can override via dataRetention.storeEvents.Environment variable: GUARDIAN_STORAGE_EVENT_STORAGE_ENABLEDCLI flag: --storage.event-storage-enabled
storage.max-log-size-kb
integer
default:"100"
Maximum log size to store per execution in KB. Logs exceeding this size are truncated.Environment variable: GUARDIAN_STORAGE_MAX_LOG_SIZE_KBCLI flag: --storage.max-log-size-kb
storage.log-retention-days
integer
default:"0"
How long to keep stored logs. If 0, uses history-retention.default-days value.Environment variable: GUARDIAN_STORAGE_LOG_RETENTION_DAYSCLI flag: --storage.log-retention-days

History Retention

Cluster-wide defaults for execution history retention.
history-retention.default-days
integer
default:"30"
Default retention period in days for execution history. Individual monitors can override via dataRetention.retentionDays.Environment variable: GUARDIAN_HISTORY_RETENTION_DEFAULT_DAYSCLI flag: --history-retention.default-days
history-retention.max-days
integer
default:"90"
Maximum allowed retention period. Prevents monitors from retaining data indefinitely.Environment variable: GUARDIAN_HISTORY_RETENTION_MAX_DAYSCLI flag: --history-retention.max-days

Rate Limits

Global rate limits to prevent alert storms.
rate-limits.max-alerts-per-minute
integer
default:"50"
Maximum alerts per minute across all channels and monitors.Environment variable: GUARDIAN_RATE_LIMITS_MAX_ALERTS_PER_MINUTECLI flag: --rate-limits.max-alerts-per-minute
rate-limits.burst-limit
integer
default:"10"
Maximum burst of alerts allowed in a short window.Environment variable: GUARDIAN_RATE_LIMITS_BURST_LIMITCLI flag: --rate-limits.burst-limit
rate-limits.default-suppress-duplicates-for
duration
default:"1h"
Default duration to suppress duplicate alerts. Individual monitors can override via alerting.suppressDuplicatesFor.Environment variable: GUARDIAN_RATE_LIMITS_DEFAULT_SUPPRESS_DUPLICATES_FORCLI flag: --rate-limits.default-suppress-duplicates-for

UI Server

Configures the web UI and REST API server.
ui.enabled
boolean
default:"true"
Enable the UI server (serves both web UI and REST API).Environment variable: GUARDIAN_UI_ENABLEDCLI flag: --ui.enabled
ui.port
integer
default:"8080"
Port for the UI server.Environment variable: GUARDIAN_UI_PORTCLI flag: --ui.port

Metrics Server

Configures Prometheus metrics endpoint.
metrics.bind-address
string
default:"0"
Address to bind the metrics endpoint. Use "0" to disable metrics, or ":8443" to bind to port 8443 on all interfaces.Environment variable: GUARDIAN_METRICS_BIND_ADDRESSCLI flag: --metrics.bind-address
metrics.secure
boolean
default:"true"
Enable HTTPS for the metrics endpoint.Environment variable: GUARDIAN_METRICS_SECURECLI flag: --metrics.secure
metrics.cert-path
string
Directory containing TLS certificates for metrics endpoint.Environment variable: GUARDIAN_METRICS_CERT_PATHCLI flag: --metrics.cert-path
metrics.cert-name
string
default:"tls.crt"
Certificate file name within cert-path.Environment variable: GUARDIAN_METRICS_CERT_NAMECLI flag: --metrics.cert-name
metrics.cert-key
string
default:"tls.key"
Private key file name within cert-path.Environment variable: GUARDIAN_METRICS_CERT_KEYCLI flag: --metrics.cert-key

Health Probes

Configures liveness and readiness probe endpoints.
probes.bind-address
string
default:":8081"
Address to bind health probe endpoints (/healthz and /readyz).Environment variable: GUARDIAN_PROBES_BIND_ADDRESSCLI flag: --probes.bind-address

Leader Election

Configures leader election for high-availability deployments.
leader-election.enabled
boolean
default:"false"
Enable leader election. Required when running multiple operator replicas.Environment variable: GUARDIAN_LEADER_ELECTION_ENABLEDCLI flag: --leader-election.enabled
leader-election.lease-duration
duration
default:"15s"
Duration that non-leader candidates will wait to force acquire leadership.Environment variable: GUARDIAN_LEADER_ELECTION_LEASE_DURATIONCLI flag: --leader-election.lease-duration
leader-election.renew-deadline
duration
default:"10s"
Duration the leader will retry refreshing leadership before giving up.Environment variable: GUARDIAN_LEADER_ELECTION_RENEW_DEADLINECLI flag: --leader-election.renew-deadline
leader-election.retry-period
duration
default:"2s"
Duration candidates should wait between leadership acquisition attempts.Environment variable: GUARDIAN_LEADER_ELECTION_RETRY_PERIODCLI flag: --leader-election.retry-period

Webhook Server

Configures the validating/mutating webhook server.
webhook.cert-path
string
Directory containing TLS certificates for the webhook server.Environment variable: GUARDIAN_WEBHOOK_CERT_PATHCLI flag: --webhook.cert-path
webhook.cert-name
string
default:"tls.crt"
Certificate file name within cert-path.Environment variable: GUARDIAN_WEBHOOK_CERT_NAMECLI flag: --webhook.cert-name
webhook.cert-key
string
default:"tls.key"
Private key file name within cert-path.Environment variable: GUARDIAN_WEBHOOK_CERT_KEYCLI flag: --webhook.cert-key
webhook.enable-http2
boolean
default:"false"
Enable HTTP/2 for the webhook server. Disabled by default for security.Environment variable: GUARDIAN_WEBHOOK_ENABLE_HTTP2CLI flag: --webhook.enable-http2

Complete Example

PostgreSQL with High Availability

log-level: info

scheduler:
  dead-man-switch-interval: 1m
  sla-recalculation-interval: 5m
  prune-interval: 1h
  startup-grace-period: 30s

storage:
  type: postgres
  postgres:
    host: postgres.database.svc.cluster.local
    port: 5432
    database: cronjob_guardian
    username: guardian
    # Password via environment: GUARDIAN_STORAGE_POSTGRES_PASSWORD
    ssl-mode: require
    pool:
      max-idle-conns: 10
      max-open-conns: 100
      conn-max-lifetime: 1h
      conn-max-idle-time: 10m
  log-storage-enabled: true
  event-storage-enabled: true
  max-log-size-kb: 200
  log-retention-days: 30

history-retention:
  default-days: 30
  max-days: 90

rate-limits:
  max-alerts-per-minute: 100
  burst-limit: 20
  default-suppress-duplicates-for: 1h

ui:
  enabled: true
  port: 8080

metrics:
  bind-address: ":8443"
  secure: true
  cert-path: /etc/certs/metrics

probes:
  bind-address: ":8081"

leader-election:
  enabled: true
  lease-duration: 15s
  renew-deadline: 10s
  retry-period: 2s

webhook:
  cert-path: /etc/certs/webhook
  enable-http2: false

Environment Variable Example

# Operator deployment with environment variables
export GUARDIAN_LOG_LEVEL=debug
export GUARDIAN_STORAGE_TYPE=postgres
export GUARDIAN_STORAGE_POSTGRES_HOST=postgres.database.svc.cluster.local
export GUARDIAN_STORAGE_POSTGRES_DATABASE=guardian
export GUARDIAN_STORAGE_POSTGRES_USERNAME=guardian
export GUARDIAN_STORAGE_POSTGRES_PASSWORD=secret-password
export GUARDIAN_LEADER_ELECTION_ENABLED=true

./cronjob-guardian

Build docs developers (and LLMs) love