http://localhost:8080 (or your configured API port).
Accessing the Dashboard
For production deployments, expose the dashboard via an Ingress or LoadBalancer. See the production setup guide for details.
Dashboard Overview
The dashboard provides several key views:Home / Statistics
The home page shows cluster-wide statistics:- Total Monitors: Number of active CronJobMonitor resources
- Total CronJobs: Number of monitored CronJobs
- Health Summary: Breakdown by status (healthy, warning, critical, suspended)
- Active Alerts: Current open alerts across all monitors
- Executions (24h): Number of job executions recorded in the last 24 hours
GET /api/v1/stats
Monitors View
Lists all CronJobMonitor resources with their status. API Endpoint:GET /api/v1/monitors
Monitor Details
Click on a monitor to view:- Configuration: Selector, dead-man’s switch, SLA settings, alert channels
- Monitored CronJobs: List of discovered CronJobs
- Status Summary: Healthy, warning, critical counts
- Recent Activity: Latest reconciliation time
GET /api/v1/monitors/{namespace}/{name}
CronJobs View
Lists all monitored CronJobs with real-time status. Features:- Search: Filter by CronJob name
- Filter by Status: Show only healthy, warning, or critical jobs
- Filter by Namespace: Narrow down to specific namespaces
- Sort: By name, namespace, success rate, or last run
GET /api/v1/cronjobs
CronJob Details Page
Click on a CronJob to view detailed information:Overview Section
- Name & Namespace: CronJob identification
- Schedule: Cron expression and timezone
- Status: Current health (healthy, warning, critical, suspended)
- Monitor: Link to the CronJobMonitor watching this job
- Next Run: Scheduled next execution time
- Last Success: Timestamp of last successful run
Metrics Section
- Success Rate: 7-day and 30-day success rates
- Total Runs: Count over the SLA window
- Duration Statistics: Average, P50, P95, P99
- Visual Charts: Success rate trends, duration heatmaps
Active Jobs Section
Currently running job instances:- Job Name: Kubernetes Job resource name
- Start Time: When the job started
- Running Duration: How long it’s been running
- Pod Phase: Pending, Running, etc.
- Pod Name: Link to pod details
Active Alerts Section
Current alerts for this CronJob:- Alert Type: jobFailed, deadManTriggered, slaBreached, etc.
- Severity: Critical or warning
- Message: Alert description
- Since: When the alert started
- Context: Exit code, reason, suggested fix (if available)
Execution History Section
Paginated list of past executions (last 30 days by default):- Job Name: Kubernetes Job resource
- Status: Success or failed
- Start Time: When the job started
- Duration: How long the job ran
- Exit Code: Container exit code (for failures)
- Actions: View logs, view details
GET /api/v1/cronjobs/{namespace}/{name}
Execution History
View detailed execution history for a specific CronJob. API Endpoint:GET /api/v1/cronjobs/{namespace}/{name}/executions
Execution Details
Click on an execution to view:- Full Execution Details: Start time, completion time, duration, exit code, reason
- Stored Logs: Container logs (if log storage is enabled)
- Stored Events: Kubernetes events (if event storage is enabled)
- Retry Information: If this execution is a retry, links to the original job
GET /api/v1/cronjobs/{namespace}/{name}/executions/{jobName}
Viewing Logs
View container logs for a specific job execution. API Endpoint:GET /api/v1/cronjobs/{namespace}/{name}/executions/{jobName}/logs
Alerts View
View all active alerts across all monitored CronJobs. Features:- Filter by Severity: Critical or warning
- Filter by Type: jobFailed, deadManTriggered, slaBreached, etc.
- Filter by Namespace or CronJob: Narrow down to specific resources
- Sort: By severity, time, or CronJob name
GET /api/v1/alerts
Alert History
View historical alerts (resolved or expired). API Endpoint:GET /api/v1/alerts/history
Alert Channels View
Manage and monitor alert channels (Slack, PagerDuty, email, webhooks). Features:- Channel Status: Ready or not ready
- Statistics: Total alerts sent, failed alerts, consecutive failures
- Last Alert Time: When the last alert was successfully sent
- Test Alerts: Send a test alert to verify channel configuration
GET /api/v1/channels
Channel Details
View detailed information about a specific channel:- Type: Slack, PagerDuty, email, webhook
- Configuration: Redacted sensitive values
- Test Results: Last test time and result
- Statistics: Detailed send/failure stats
GET /api/v1/channels/{name}
Testing a Channel
Send a test alert to verify the channel is working: API Endpoint:POST /api/v1/channels/{name}/test
Actions on CronJobs
The dashboard allows you to perform actions on CronJobs:Trigger a CronJob Manually
Create a Job manually (useful for testing): API Endpoint:POST /api/v1/cronjobs/{namespace}/{name}/trigger
Suspend a CronJob
Prevent scheduled runs: API Endpoint:POST /api/v1/cronjobs/{namespace}/{name}/suspend
Resume a CronJob
Resume scheduled runs: API Endpoint:POST /api/v1/cronjobs/{namespace}/{name}/resume
Delete Execution History
Delete all execution history for a specific CronJob: API Endpoint:DELETE /api/v1/cronjobs/{namespace}/{name}/history
Testing Suggested Fix Patterns
Test custom suggested fix patterns before deploying them: API Endpoint:POST /api/v1/patterns/test
Health Check
Check the health and status of the Guardian operator: API Endpoint:GET /api/v1/health
Configuration View
View the operator’s current configuration (sensitive values redacted): API Endpoint:GET /api/v1/config
Admin Actions
Storage Statistics
View storage backend health and statistics: API Endpoint:GET /api/v1/admin/storage-stats
Manual Pruning
Trigger manual pruning of old execution records: API Endpoint:POST /api/v1/admin/prune
Exposing the Dashboard in Production
Using Ingress
Using LoadBalancer
Update the Helm values:Dashboard Best Practices
Bookmark Key Views
Bookmark frequently used CronJob detail pages for quick access.
Set Up Monitoring Dashboards
Export Prometheus metrics and create Grafana dashboards for long-term trends.
Use Filters
Use namespace and status filters to focus on critical jobs.
Check Health Regularly
Monitor the
/health endpoint to ensure Guardian is running properly.Next Steps
Troubleshooting
Common issues and solutions
REST API Reference
Complete API documentation