Never miss a CronJob failure again

Monitor your Kubernetes CronJobs with SLA tracking, dead-man’s switch detection, and intelligent alerts. Get notified when jobs fail, miss their schedule, or regress in performance.

Get Started Learn the Concepts

Quick Start

Get CronJob Guardian running in your cluster in minutes

Install via Helm

Deploy the operator to your Kubernetes cluster using Helm:

helm install cronjob-guardian \
  oci://ghcr.io/illeniumstudios/charts/cronjob-guardian \
  --namespace cronjob-guardian \
  --create-namespace

The operator will start monitoring CronJobs immediately.

Configure alerts

Set up a Slack alert channel to receive notifications:

kubectl create secret generic slack-webhook \
  --namespace cronjob-guardian \
  --from-literal=url=https://hooks.slack.com/services/YOUR/WEBHOOK/URL

apiVersion: guardian.illenium.net/v1alpha1
kind: AlertChannel
metadata:
  name: slack-alerts
  namespace: cronjob-guardian
spec:
  type: slack
  slack:
    webhookSecretRef:
      name: slack-webhook
      namespace: cronjob-guardian
      key: url

Create a monitor

Monitor all CronJobs in a namespace with automatic dead-man’s switch detection:

apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: production-monitor
  namespace: production
spec:
  selector: {}  # empty = all CronJobs in namespace
  deadManSwitch:
    enabled: true
    autoFromSchedule:
      enabled: true
  alerting:
    channelRefs:
      - name: slack-alerts

Apply it with kubectl apply -f monitor.yaml and you’ll be alerted when jobs fail or miss their schedule.

Access the dashboard

View real-time status and metrics in the built-in web dashboard:

kubectl port-forward -n cronjob-guardian svc/cronjob-guardian 8080:8080

Open http://localhost:8080 to see your CronJobs, SLA metrics, and alert history.

Key Features

Everything you need to keep your CronJobs healthy and reliable

Dead-Man's Switch

Automatically detect when CronJobs don’t run within expected windows. Get alerted before problems become incidents.

SLA Tracking

Monitor success rates and duration percentiles (P50/P95/P99). Set thresholds and get alerts when SLAs are breached.

Intelligent Alerts

Rich alerts with pod logs, Kubernetes events, and AI-suggested fixes. Know exactly what went wrong and how to fix it.

Multiple Channels

Send alerts to Slack, PagerDuty, webhooks, or email. Route critical vs warning alerts to different channels.

Built-in Dashboard

Feature-rich web UI with charts, heatmaps, and execution history. Export data as CSV or JSON.

Prometheus Metrics

Export metrics for existing monitoring infrastructure. Integrate with Grafana and AlertManager.

Explore by Topic

Deep dive into specific areas

Core Concepts

Understand monitors, alert channels, and how the operator works

Configuration Guides

Step-by-step guides for common monitoring scenarios

Operations

Architecture, storage options, and production best practices

API Reference

Complete CRD documentation and configuration options

Examples

Ready-to-use YAML configurations for monitors and alerts

Troubleshooting

Common issues and how to resolve them

Ready to start monitoring?

Install CronJob Guardian in your cluster and get visibility into your CronJobs in minutes. No vendor lock-in, runs entirely in your infrastructure.

View Installation Guide

99.9%

Success Rate

<2s

Alert Latency

24/7

Monitoring

Vendor Lock-in