Monitoring CronJobs

CronJob Guardian provides flexible monitoring options to track your CronJobs across different scopes and selectors.

Monitoring Strategies

You can monitor CronJobs in several ways:

Single namespace: Monitor all or selected CronJobs in one namespace
Multiple namespaces: Monitor specific namespaces by listing them
Namespace selector: Dynamically discover namespaces by labels
Cluster-wide: Monitor all CronJobs across all namespaces
Label selector: Filter CronJobs by labels

Basic Namespace Monitoring

The simplest way to monitor CronJobs is within a single namespace.

Create a CronJobMonitor in your namespace

Deploy a monitor in the same namespace as your CronJobs. An empty selector monitors all CronJobs in that namespace.

apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: critical-jobs
  namespace: production
spec:
  selector:
    matchLabels:
      tier: critical
  deadManSwitch:
    enabled: true
    maxTimeSinceLastSuccess: 25h
  sla:
    enabled: true
    minSuccessRate: 99
    windowDays: 7
  alerting:
    channelRefs:
      - name: slack-ops

Apply the monitor

kubectl apply -f monitor.yaml

Verify the monitor is active

kubectl get cronjobmonitor -n production

Check the status to see discovered CronJobs:

kubectl describe cronjobmonitor critical-jobs -n production

Label-Based Monitoring

Monitor CronJobs that match specific labels using matchLabels or matchExpressions.

Using matchLabels

apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: critical-jobs
  namespace: production
spec:
  selector:
    matchLabels:
      tier: critical

Using matchExpressions

For more advanced filtering:

apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: high-priority-jobs
  namespace: production
spec:
  selector:
    matchExpressions:
      - key: tier
        operator: In
        values: [critical, high]
      - key: backup
        operator: Exists

Supported operators: In, NotIn, Exists, DoesNotExist

Monitoring Specific CronJobs by Name

You can explicitly list CronJob names to monitor (only valid for single-namespace monitoring):

apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: backup-jobs
  namespace: databases
spec:
  selector:
    matchNames:
      - daily-backup
      - weekly-report
      - monthly-archive

Cluster-Wide Monitoring

Monitor all CronJobs across all namespaces (except globally ignored ones).

apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: cluster-wide-monitor
  namespace: cronjob-guardian
spec:
  selector:
    # Watch all namespaces
    allNamespaces: true
    # Optionally filter by labels
    matchLabels:
      tier: critical
  deadManSwitch:
    enabled: true
    maxTimeSinceLastSuccess: 25h
  alerting:
    channelRefs:
      - name: pagerduty-critical
        severities: [critical]
      - name: slack-ops
        severities: [critical, warning]

Cluster-wide monitoring requires appropriate RBAC permissions. The operator’s service account must have cluster-wide read access to CronJobs.

Namespace Selector Monitoring

Dynamically discover and monitor namespaces that match specific labels.

apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: production-jobs
  namespace: cronjob-guardian
spec:
  selector:
    # Select namespaces by their labels
    namespaceSelector:
      matchLabels:
        environment: production
    # Optionally filter CronJobs within matching namespaces
    matchLabels:
      monitored: "true"
  sla:
    enabled: true
    minSuccessRate: 95
    windowDays: 7
  alerting:
    channelRefs:
      - name: slack-ops

Label your namespaces

kubectl label namespace prod-app environment=production
kubectl label namespace prod-api environment=production

Create the monitor with namespace selector

The monitor will automatically discover all namespaces labeled environment=production and watch CronJobs within them.

Verify discovered namespaces

kubectl describe cronjobmonitor production-jobs -n cronjob-guardian

Multi-Namespace Monitoring

Explicitly list multiple namespaces to monitor:

apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: multi-namespace-monitor
  namespace: cronjob-guardian
spec:
  selector:
    namespaces:
      - production
      - staging
      - qa
    matchLabels:
      monitored: "true"

Real-World Example: Database Backups

Here’s a complete example monitoring critical database backup jobs:

apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: database-backups
  namespace: databases
spec:
  selector:
    matchLabels:
      type: backup
  deadManSwitch:
    enabled: true
    maxTimeSinceLastSuccess: 25h  # Daily backups with 1h buffer
  sla:
    enabled: true
    minSuccessRate: 100  # Backups must never fail
    maxDuration: 1h      # Alert if backup takes too long
  alerting:
    channelRefs:
      - name: pagerduty-dba
        severities: [critical]
    severityOverrides:
      jobFailed: critical
      deadManTriggered: critical
    # Custom fix suggestion for backup failures
    suggestedFixPatterns:
      - name: disk-full
        match:
          logPattern: "No space left on device|disk full"
        suggestion: "Backup storage is full. Check PVC usage: kubectl get pvc -n {\{.Namespace}\}"
        priority: 150

Monitoring Best Practices

Start Small

Begin with namespace-scoped monitors before moving to cluster-wide monitoring.

Use Labels

Label your CronJobs consistently (e.g., tier: critical, type: backup) for easier monitoring.

Avoid Overlap

Ensure monitors don’t overlap unnecessarily. If multiple monitors watch the same CronJob, you’ll get duplicate alerts.

Monitor the Monitor

Use kubectl get cronjobmonitor regularly to verify monitors are in Active phase.

Checking Monitor Status

View all monitors and their status:

kubectl get cronjobmonitor -A

Expected output:

NAMESPACE           NAME                    CRONJOBS   HEALTHY   WARNING   CRITICAL   ALERTS   AGE
production          critical-jobs           5          4         1         0          1        2d
cronjob-guardian    cluster-wide-monitor    42         39        2         1          3        5d
databases           database-backups        3          3         0         0          0        10d

View detailed status for a specific monitor:

kubectl describe cronjobmonitor critical-jobs -n production

Ignored Namespaces

By default, these namespaces are ignored (configured in config.yaml):

ignored-namespaces:
  - kube-system
  - kube-public
  - kube-node-lease

To override this globally, update the operator configuration:

# values.yaml for Helm chart
config:
  ignoredNamespaces:
    - kube-system
    - kube-public

Get Started

Core Concepts

Guides

Operations

Monitoring Strategies

Basic Namespace Monitoring

Label-Based Monitoring

Using matchLabels

Using matchExpressions

Monitoring Specific CronJobs by Name

Cluster-Wide Monitoring

Namespace Selector Monitoring

Multi-Namespace Monitoring

Real-World Example: Database Backups

Monitoring Best Practices

Start Small

Use Labels

Avoid Overlap

Monitor the Monitor

Checking Monitor Status

Ignored Namespaces

Next Steps

Configure Alerts

SLA Configuration

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Operations

​Monitoring Strategies

​Basic Namespace Monitoring

​Label-Based Monitoring

​Using matchLabels

​Using matchExpressions

​Monitoring Specific CronJobs by Name

​Cluster-Wide Monitoring

​Namespace Selector Monitoring

​Multi-Namespace Monitoring

​Real-World Example: Database Backups

​Monitoring Best Practices

Start Small

Use Labels

Avoid Overlap

Monitor the Monitor

​Checking Monitor Status

​Ignored Namespaces

​Next Steps

Configure Alerts

SLA Configuration

Build docs developers (and LLMs) love

Monitoring Strategies

Basic Namespace Monitoring

Label-Based Monitoring

Using matchLabels

Using matchExpressions

Monitoring Specific CronJobs by Name

Cluster-Wide Monitoring

Namespace Selector Monitoring

Multi-Namespace Monitoring

Real-World Example: Database Backups

Monitoring Best Practices

Checking Monitor Status

Ignored Namespaces

Next Steps