Using the Dashboard - CronJob Guardian

CronJob Guardian includes a built-in web dashboard for visualizing CronJob health, metrics, and alerts. Access it at http://localhost:8080 (or your configured API port).

Accessing the Dashboard

Port-forward the API service (for local access)

kubectl port-forward -n cronjob-guardian svc/cronjob-guardian-api 8080:8080

Open in your browser

Navigate to http://localhost:8080

Explore the dashboard

The dashboard automatically connects to the Guardian API running in your cluster.

For production deployments, expose the dashboard via an Ingress or LoadBalancer. See the production setup guide for details.

Dashboard Overview

The dashboard provides several key views:

Home / Statistics

The home page shows cluster-wide statistics:

Total Monitors: Number of active CronJobMonitor resources
Total CronJobs: Number of monitored CronJobs
Health Summary: Breakdown by status (healthy, warning, critical, suspended)
Active Alerts: Current open alerts across all monitors
Executions (24h): Number of job executions recorded in the last 24 hours

API Endpoint: GET /api/v1/stats

curl http://localhost:8080/api/v1/stats

Response:

{
  "totalMonitors": 5,
  "totalCronJobs": 42,
  "summary": {
    "healthy": 38,
    "warning": 3,
    "critical": 1,
    "suspended": 2,
    "running": 0
  },
  "activeAlerts": 4,
  "executionsRecorded24h": 1247
}

Monitors View

Lists all CronJobMonitor resources with their status. API Endpoint: GET /api/v1/monitors

curl http://localhost:8080/api/v1/monitors

Monitor Details

Click on a monitor to view:

Configuration: Selector, dead-man’s switch, SLA settings, alert channels
Monitored CronJobs: List of discovered CronJobs
Status Summary: Healthy, warning, critical counts
Recent Activity: Latest reconciliation time

API Endpoint: GET /api/v1/monitors/{namespace}/{name}

curl http://localhost:8080/api/v1/monitors/production/critical-jobs

CronJobs View

Lists all monitored CronJobs with real-time status. Features:

Search: Filter by CronJob name
Filter by Status: Show only healthy, warning, or critical jobs
Filter by Namespace: Narrow down to specific namespaces
Sort: By name, namespace, success rate, or last run

API Endpoint: GET /api/v1/cronjobs

# All CronJobs
curl http://localhost:8080/api/v1/cronjobs

# Filter by namespace
curl http://localhost:8080/api/v1/cronjobs?namespace=production

# Filter by status
curl http://localhost:8080/api/v1/cronjobs?status=critical

# Search
curl http://localhost:8080/api/v1/cronjobs?search=backup

CronJob Details Page

Click on a CronJob to view detailed information:

Overview Section

Name & Namespace: CronJob identification
Schedule: Cron expression and timezone
Status: Current health (healthy, warning, critical, suspended)
Monitor: Link to the CronJobMonitor watching this job
Next Run: Scheduled next execution time
Last Success: Timestamp of last successful run

Metrics Section

Success Rate: 7-day and 30-day success rates
Total Runs: Count over the SLA window
Duration Statistics: Average, P50, P95, P99
Visual Charts: Success rate trends, duration heatmaps

Active Jobs Section

Currently running job instances:

Job Name: Kubernetes Job resource name
Start Time: When the job started
Running Duration: How long it’s been running
Pod Phase: Pending, Running, etc.
Pod Name: Link to pod details

Active Alerts Section

Current alerts for this CronJob:

Alert Type: jobFailed, deadManTriggered, slaBreached, etc.
Severity: Critical or warning
Message: Alert description
Since: When the alert started
Context: Exit code, reason, suggested fix (if available)

Execution History Section

Paginated list of past executions (last 30 days by default):

Job Name: Kubernetes Job resource
Status: Success or failed
Start Time: When the job started
Duration: How long the job ran
Exit Code: Container exit code (for failures)
Actions: View logs, view details

API Endpoint: GET /api/v1/cronjobs/{namespace}/{name}

curl http://localhost:8080/api/v1/cronjobs/production/daily-report

Execution History

View detailed execution history for a specific CronJob. API Endpoint: GET /api/v1/cronjobs/{namespace}/{name}/executions

# Default (last 30 days, 20 per page)
curl http://localhost:8080/api/v1/cronjobs/production/daily-report/executions

# Pagination
curl http://localhost:8080/api/v1/cronjobs/production/daily-report/executions?limit=50&offset=0

# Filter by status
curl http://localhost:8080/api/v1/cronjobs/production/daily-report/executions?status=failed

# Filter by time
curl http://localhost:8080/api/v1/cronjobs/production/daily-report/executions?since=2024-01-01T00:00:00Z

Execution Details

Click on an execution to view:

Full Execution Details: Start time, completion time, duration, exit code, reason
Stored Logs: Container logs (if log storage is enabled)
Stored Events: Kubernetes events (if event storage is enabled)
Retry Information: If this execution is a retry, links to the original job

API Endpoint: GET /api/v1/cronjobs/{namespace}/{name}/executions/{jobName}

curl http://localhost:8080/api/v1/cronjobs/production/daily-report/executions/daily-report-28472918

Viewing Logs

View container logs for a specific job execution. API Endpoint: GET /api/v1/cronjobs/{namespace}/{name}/executions/{jobName}/logs

# Default (last 500 lines)
curl http://localhost:8080/api/v1/cronjobs/production/daily-report/executions/daily-report-28472918/logs

# Specify container
curl "http://localhost:8080/api/v1/cronjobs/production/daily-report/executions/daily-report-28472918/logs?container=main"

# Tail more lines
curl "http://localhost:8080/api/v1/cronjobs/production/daily-report/executions/daily-report-28472918/logs?tailLines=1000"

Alerts View

View all active alerts across all monitored CronJobs. Features:

Filter by Severity: Critical or warning
Filter by Type: jobFailed, deadManTriggered, slaBreached, etc.
Filter by Namespace or CronJob: Narrow down to specific resources
Sort: By severity, time, or CronJob name

API Endpoint: GET /api/v1/alerts

# All active alerts
curl http://localhost:8080/api/v1/alerts

# Filter by severity
curl http://localhost:8080/api/v1/alerts?severity=critical

# Filter by type
curl http://localhost:8080/api/v1/alerts?type=jobFailed

# Filter by CronJob
curl http://localhost:8080/api/v1/alerts?namespace=production&cronjob=daily-report

Alert History

View historical alerts (resolved or expired). API Endpoint: GET /api/v1/alerts/history

# Default (last 50 alerts)
curl http://localhost:8080/api/v1/alerts/history

# Pagination
curl http://localhost:8080/api/v1/alerts/history?limit=100&offset=0

# Filter by severity
curl http://localhost:8080/api/v1/alerts/history?severity=critical

# Filter by time
curl http://localhost:8080/api/v1/alerts/history?since=2024-01-01T00:00:00Z

Alert Channels View

Manage and monitor alert channels (Slack, PagerDuty, email, webhooks). Features:

Channel Status: Ready or not ready
Statistics: Total alerts sent, failed alerts, consecutive failures
Last Alert Time: When the last alert was successfully sent
Test Alerts: Send a test alert to verify channel configuration

API Endpoint: GET /api/v1/channels

curl http://localhost:8080/api/v1/channels

Channel Details

View detailed information about a specific channel:

Type: Slack, PagerDuty, email, webhook
Configuration: Redacted sensitive values
Test Results: Last test time and result
Statistics: Detailed send/failure stats

API Endpoint: GET /api/v1/channels/{name}

curl http://localhost:8080/api/v1/channels/slack-alerts

Testing a Channel

Send a test alert to verify the channel is working: API Endpoint: POST /api/v1/channels/{name}/test

curl -X POST http://localhost:8080/api/v1/channels/slack-alerts/test

Response:

{
  "success": true,
  "message": "Test alert sent successfully"
}

Actions on CronJobs

The dashboard allows you to perform actions on CronJobs:

Trigger a CronJob Manually

Create a Job manually (useful for testing): API Endpoint: POST /api/v1/cronjobs/{namespace}/{name}/trigger

curl -X POST http://localhost:8080/api/v1/cronjobs/production/daily-report/trigger

Response:

{
  "success": true,
  "jobName": "daily-report-manual-1705432890",
  "message": "Job created successfully"
}

Suspend a CronJob

Prevent scheduled runs: API Endpoint: POST /api/v1/cronjobs/{namespace}/{name}/suspend

curl -X POST http://localhost:8080/api/v1/cronjobs/production/daily-report/suspend

Resume a CronJob

Resume scheduled runs: API Endpoint: POST /api/v1/cronjobs/{namespace}/{name}/resume

curl -X POST http://localhost:8080/api/v1/cronjobs/production/daily-report/resume

Delete Execution History

Delete all execution history for a specific CronJob: API Endpoint: DELETE /api/v1/cronjobs/{namespace}/{name}/history

curl -X DELETE http://localhost:8080/api/v1/cronjobs/production/daily-report/history

Response:

{
  "success": true,
  "deletedCount": 150,
  "message": "Deleted 150 execution records"
}

This permanently deletes all execution records, logs, and events for the CronJob. Use with caution.

Testing Suggested Fix Patterns

Test custom suggested fix patterns before deploying them: API Endpoint: POST /api/v1/patterns/test

curl -X POST http://localhost:8080/api/v1/patterns/test \
  -H "Content-Type: application/json" \
  -d '{
    "pattern": {
      "name": "custom-oom",
      "match": {
        "exitCode": 137
      },
      "suggestion": "Increase memory limit for {{\{ .CronJob.Name }\}}"
    },
    "testContext": {
      "namespace": "production",
      "cronjobName": "daily-report",
      "exitCode": 137,
      "reason": "OOMKilled"
    }
  }'

Response:

{
  "matched": true,
  "suggestion": "Increase memory limit for daily-report"
}

Health Check

Check the health and status of the Guardian operator: API Endpoint: GET /api/v1/health

curl http://localhost:8080/api/v1/health

Response:

{
  "status": "healthy",
  "storage": "connected",
  "leader": true,
  "version": "v0.5.0",
  "uptime": "5h30m15s",
  "analyzerEnabled": true,
  "schedulersRunning": [
    "dead-man-switch",
    "sla-recalculation",
    "stuck-job-check",
    "prune"
  ]
}

Configuration View

View the operator’s current configuration (sensitive values redacted): API Endpoint: GET /api/v1/config

curl http://localhost:8080/api/v1/config

Admin Actions

Storage Statistics

View storage backend health and statistics: API Endpoint: GET /api/v1/admin/storage-stats

curl http://localhost:8080/api/v1/admin/storage-stats

Response:

{
  "executionCount": 125847,
  "storageType": "sqlite",
  "healthy": true,
  "retentionDays": 30,
  "logStorageEnabled": true
}

Manual Pruning

Trigger manual pruning of old execution records: API Endpoint: POST /api/v1/admin/prune

# Dry run (preview what would be deleted)
curl -X POST http://localhost:8080/api/v1/admin/prune \
  -H "Content-Type: application/json" \
  -d '{"olderThanDays": 30, "dryRun": true}'

# Actual prune
curl -X POST http://localhost:8080/api/v1/admin/prune \
  -H "Content-Type: application/json" \
  -d '{"olderThanDays": 30, "dryRun": false}'

# Prune logs only
curl -X POST http://localhost:8080/api/v1/admin/prune \
  -H "Content-Type: application/json" \
  -d '{"olderThanDays": 30, "pruneLogsOnly": true}'

Exposing the Dashboard in Production

Using Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: cronjob-guardian
  namespace: cronjob-guardian
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - guardian.example.com
      secretName: guardian-tls
  rules:
    - host: guardian.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: cronjob-guardian-api
                port:
                  number: 8080

Using LoadBalancer

Update the Helm values:

# values.yaml
api:
  enabled: true
  service:
    type: LoadBalancer
    port: 8080

Dashboard Best Practices

Bookmark Key Views

Bookmark frequently used CronJob detail pages for quick access.

Set Up Monitoring Dashboards

Export Prometheus metrics and create Grafana dashboards for long-term trends.

Use Filters

Use namespace and status filters to focus on critical jobs.

Check Health Regularly

Monitor the /health endpoint to ensure Guardian is running properly.

Next Steps

Troubleshooting

Common issues and solutions

REST API Reference

Complete API documentation

Get Started

Core Concepts

Guides

Operations

​Accessing the Dashboard

​Dashboard Overview

​Home / Statistics

​Monitors View

​Monitor Details

​CronJobs View

​CronJob Details Page

​Overview Section

​Metrics Section

​Active Jobs Section

​Active Alerts Section

​Execution History Section

​Execution History

​Execution Details

​Viewing Logs

​Alerts View

​Alert History

​Alert Channels View

​Channel Details

​Testing a Channel

​Actions on CronJobs

​Trigger a CronJob Manually

​Suspend a CronJob

​Resume a CronJob

​Delete Execution History

​Testing Suggested Fix Patterns

​Health Check

​Configuration View

​Admin Actions

​Storage Statistics

​Manual Pruning

​Exposing the Dashboard in Production

​Using Ingress

​Using LoadBalancer

​Dashboard Best Practices

Bookmark Key Views

Set Up Monitoring Dashboards

Use Filters

Check Health Regularly

​Next Steps

Troubleshooting

REST API Reference

Build docs developers (and LLMs) love

Accessing the Dashboard

Dashboard Overview

Home / Statistics

Monitors View

Monitor Details

CronJobs View

CronJob Details Page

Overview Section

Metrics Section

Active Jobs Section

Active Alerts Section

Execution History Section

Execution History

Execution Details

Viewing Logs

Alerts View

Alert History

Alert Channels View

Channel Details

Testing a Channel

Actions on CronJobs

Trigger a CronJob Manually

Suspend a CronJob

Resume a CronJob

Delete Execution History

Testing Suggested Fix Patterns

Health Check

Configuration View

Admin Actions

Storage Statistics

Manual Pruning

Exposing the Dashboard in Production

Using Ingress

Using LoadBalancer

Dashboard Best Practices

Next Steps