Introduction

What is CronJob Guardian?

CronJob Guardian is a Kubernetes operator that monitors CronJobs with SLA tracking, intelligent alerting, and a built-in dashboard. It ensures your critical scheduled jobs run successfully and alerts you when something goes wrong.

The Problem

CronJobs power critical operations like backups, ETL pipelines, and reports—but Kubernetes provides no built-in monitoring for them. When jobs fail silently or stop running, you only find out when it’s too late. Common issues that go undetected:

Silent failures: Jobs fail but no one knows until data is missing
Jobs stop running: Schedule issues or resource constraints prevent execution
Performance degradation: Jobs slow down gradually over time
Resource leaks: Failed jobs consume cluster resources

How Guardian Helps

CronJob Guardian watches your CronJobs and alerts you when something goes wrong, with rich context to help you diagnose and fix issues quickly.

Key Features

Dead-Man's Switch

Alert when CronJobs don’t run within expected windows. Automatically calculates thresholds from cron schedules or set custom intervals.

SLA Tracking

Monitor success rates, duration percentiles (P50/P95/P99), and detect regressions. Set minimum success rates and maximum duration thresholds.

Intelligent Alerts

Get rich context with pod logs, Kubernetes events, and suggested fixes. Alerts include everything you need to diagnose the issue.

Multiple Channels

Send alerts to Slack, PagerDuty, webhooks, or email. Route different severities to different channels.

Built-in Dashboard

Feature-rich web UI with charts, heatmaps, execution history, and CSV exports. No external tools required.

Prometheus Metrics

Export metrics for existing monitoring infrastructure. Integrates with your existing observability stack.

Architecture

CronJob Guardian runs as a single operator pod in your cluster with three main components:

Operator: Watches CronJobs and Jobs, tracks execution history, calculates SLA metrics
Storage: SQLite (default), PostgreSQL, or MySQL for execution history and metrics
Dashboard: Embedded web UI for viewing metrics and execution history
Custom Resources: CronJobMonitor and AlertChannel define what to monitor and where to alert

Who Should Use This?

CronJob Guardian is ideal for teams that:

Run critical scheduled jobs (backups, ETL, reports)
Need to maintain SLA commitments
Want to catch failures before customers notice
Need visibility into job performance and trends
Want centralized monitoring across multiple namespaces or clusters

Example Use Cases

Database Backups

Ensure nightly backups run successfully with 100% success rate monitoring:

spec:
  selector:
    matchLabels:
      type: backup
  deadManSwitch:
    enabled: true
    maxTimeSinceLastSuccess: 25h  # Daily + 1h buffer
  sla:
    enabled: true
    minSuccessRate: 100  # Backups must never fail
  alerting:
    channelRefs:
      - name: pagerduty-oncall

ETL Pipelines

Monitor data pipelines with duration regression detection:

spec:
  selector:
    matchLabels:
      type: etl
  sla:
    enabled: true
    maxDuration: 30m
    durationRegression:
      enabled: true
      percentile: 95
      thresholdPercent: 50  # Alert if P95 increases 50%

Financial Reports

Quiet alerts during planned maintenance:

spec:
  selector:
    matchLabels:
      type: report
  maintenanceWindows:
    - name: monthly-maintenance
      schedule: "0 2 1 * *"  # First day of month at 2 AM
      duration: 4h

What’s Next?

Quickstart

Get CronJob Guardian running in 5 minutes

Installation

Detailed installation guide with all configuration options

Core Concepts

Learn about monitors, alert channels, and SLA tracking

Examples

Real-world configuration examples

Get Started

Core Concepts

Guides

Operations

What is CronJob Guardian?

The Problem

How Guardian Helps

Key Features

Dead-Man's Switch

SLA Tracking

Intelligent Alerts

Multiple Channels

Built-in Dashboard

Prometheus Metrics

Architecture

Who Should Use This?

Example Use Cases

Database Backups

ETL Pipelines

Financial Reports

What’s Next?

Quickstart

Installation

Core Concepts

Examples

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Operations

​What is CronJob Guardian?

​The Problem

​How Guardian Helps

​Key Features

Dead-Man's Switch

SLA Tracking

Intelligent Alerts

Multiple Channels

Built-in Dashboard

Prometheus Metrics

​Architecture

​Who Should Use This?

​Example Use Cases

​Database Backups

​ETL Pipelines

​Financial Reports

​What’s Next?

Quickstart

Installation

Core Concepts

Examples

Build docs developers (and LLMs) love

What is CronJob Guardian?

The Problem

How Guardian Helps

Key Features

Architecture

Who Should Use This?

Example Use Cases

Database Backups

ETL Pipelines

Financial Reports

What’s Next?