Skip to main content

Prerequisites

Before installing CronJob Guardian, ensure you have:
  • Kubernetes cluster version 1.26 or higher
  • kubectl configured to access your cluster
  • Helm 3 (recommended) or kubectl for manual installation
  • Cluster admin permissions to create CustomResourceDefinitions and ClusterRoles
For production deployments, review Security and Storage configuration before installing.

Installation Methods

Verify Installation

Check that all components are running:
kubectl get pods -n cronjob-guardian
Expected output:
NAME                                READY   STATUS    RESTARTS   AGE
cronjob-guardian-7d9f8c5b6d-x4k2m   1/1     Running   0          1m
Verify CRDs are installed:
kubectl get crd | grep guardian
Expected output:
alertchannels.guardian.illenium.net      2024-03-04T08:00:00Z
cronjobmonitors.guardian.illenium.net    2024-03-04T08:00:00Z
Check operator logs:
kubectl logs -n cronjob-guardian deployment/cronjob-guardian
You should see:
INFO    setup   initialized store       {"type": "sqlite"}
INFO    setup   initialized SLA analyzer
INFO    setup   initialized alert dispatcher
INFO    setup   initialized dead-man scheduler
INFO    setup   starting manager

Configuration Options

Storage Configuration

CronJob Guardian supports three storage backends:
Best for small to medium deployments (< 100 CronJobs).
config:
  storage:
    type: sqlite
    sqlite:
      path: /data/guardian.db

persistence:
  enabled: true
  size: 1Gi
SQLite requires a persistent volume. The operator will fail to start if persistence is disabled.

Scheduler Configuration

Control how frequently Guardian checks for issues:
config:
  scheduler:
    # How often to check for dead-man's switch violations
    deadManSwitchInterval: 1m
    
    # How often to recalculate SLA metrics
    slaRecalculationInterval: 5m
    
    # How often to prune old execution history
    pruneInterval: 1h
    
    # Wait period after startup before sending alerts
    # (prevents alert floods on operator restart)
    startupGracePeriod: 30s

History Retention

Configure how long to keep execution history:
config:
  historyRetention:
    # Default retention for execution history
    defaultDays: 30
    
    # Maximum retention (monitors cannot exceed this)
    maxDays: 90
  
  storage:
    # Store pod logs in database (requires more storage)
    logStorageEnabled: false
    
    # Store Kubernetes events in database
    eventStorageEnabled: false
    
    # Maximum log size per execution (KB)
    maxLogSizeKB: 100
    
    # Log retention (0 = use defaultDays)
    logRetentionDays: 7

Rate Limiting

Prevent alert floods:
config:
  rateLimits:
    # Maximum alerts per minute across all channels
    maxAlertsPerMinute: 50
    
    # Maximum burst of alerts allowed
    burstLimit: 10
    
    # Default time to suppress duplicate alerts
    defaultSuppressDuplicatesFor: 1h

Resource Limits

Adjust based on your cluster size:
resources:
  limits:
    cpu: 500m        # Increase for > 200 CronJobs
    memory: 256Mi    # Increase if storing logs
  requests:
    cpu: 10m
    memory: 64Mi

Exposing the Dashboard

The dashboard is served on port 8080 by default. Choose an access method:
For local development and testing:
kubectl port-forward -n cronjob-guardian svc/cronjob-guardian 8080:8080
Access at http://localhost:8080

Prometheus Metrics

Enable ServiceMonitor for Prometheus Operator:
metrics:
  enabled: true
  secure: true  # Use HTTPS with authentication

serviceMonitor:
  enabled: true
  interval: 30s
  scrapeTimeout: 10s
  labels:
    release: prometheus  # Match your Prometheus selector
Available metrics:
  • cronjob_guardian_executions_total - Total execution count by status
  • cronjob_guardian_execution_duration_seconds - Execution duration histogram
  • cronjob_guardian_sla_success_rate - Current success rate percentage
  • cronjob_guardian_dead_man_switch_violations - Dead-man’s switch violations
  • cronjob_guardian_alerts_sent_total - Total alerts sent by channel

High Availability

Run multiple replicas with leader election:
replicaCount: 3

leaderElection:
  enabled: true
  leaseDuration: 15s
  renewDeadline: 10s
  retryPeriod: 2s

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app.kubernetes.io/name: cronjob-guardian
          topologyKey: kubernetes.io/hostname
When using PostgreSQL or MySQL, you can run multiple replicas without leader election. With SQLite, leader election is required for multiple replicas.

Upgrading

To upgrade to a new version:
helm upgrade cronjob-guardian oci://ghcr.io/illeniumstudios/charts/cronjob-guardian \
  --namespace cronjob-guardian \
  --values values.yaml
Check the CHANGELOG for breaking changes.

Uninstalling

Uninstalling will delete all execution history and metrics. Back up your data first if needed.

With Helm

helm uninstall cronjob-guardian --namespace cronjob-guardian

Remove CRDs

Helm does not automatically remove CRDs. Remove them manually:
kubectl delete crd cronjobmonitors.guardian.illenium.net
kubectl delete crd alertchannels.guardian.illenium.net

Remove Namespace

kubectl delete namespace cronjob-guardian

Troubleshooting

Operator Won’t Start

Check the logs:
kubectl logs -n cronjob-guardian deployment/cronjob-guardian
Common issues:
  • “unable to create store”: Check storage configuration and credentials
  • “unable to initialize store”: Database connection failed or migrations failed
  • “admission webhook not ready”: CRDs may not be installed

Dashboard Not Accessible

Verify the service:
kubectl get svc -n cronjob-guardian
Check if UI is enabled:
kubectl get deployment cronjob-guardian -n cronjob-guardian -o jsonpath='{.spec.template.spec.containers[0].args}'

High Memory Usage

If memory usage is high:
  1. Reduce history retention: historyRetention.defaultDays
  2. Disable log storage: storage.logStorageEnabled: false
  3. Reduce check intervals: scheduler.deadManSwitchInterval, scheduler.slaRecalculationInterval
  4. Increase resource limits

Next Steps

Create Monitors

Start monitoring your CronJobs

Configure Alerts

Set up alert channels

Storage Guide

Learn about storage options and migration

Security

Secure your Guardian installation

Build docs developers (and LLMs) love