The Operations API provides health monitoring and diagnostics for the planning refresh system.
All operations endpoints require the x-ops-token header. In production, requests without a valid token return 404.
Get planning system health
Get comprehensive health status of the planning refresh system including queue statistics, backup coverage, and failure tracking.
Operations token for admin access.Must match the OPS_TOKEN environment variable.curl -H "x-ops-token: YOUR_OPS_TOKEN" \
https://planningsup.app/api/ops/plannings
Response
{
"status": "healthy",
"issues": [],
"workers": {
"backfill": "idle",
"refreshWorker": "working"
},
"inQuietHours": false,
"queue": {
"depth": 15,
"ready": 3,
"locked": 1
},
"backups": {
"total": 127,
"covered": 125,
"disabled": 2
},
"lastBackupWrite": {
"planningFullId": "enscr.elevesing1iereannee",
"changed": true,
"nbEvents": 42,
"at": "2026-03-02T10:30:00.000Z"
},
"failedHosts": [
{
"host": "fac-de-sciences",
"count": 2,
"lastFailure": "http_timeout"
}
],
"recentFailures": [
{
"planningFullId": "fac-de-sciences.master.m1info",
"failureKind": "http_timeout",
"failures": 3,
"disabledUntil": "2026-03-02T11:00:00.000Z"
}
]
}
Response fields
Overall system health status:
"healthy" - All systems functioning normally
"degraded" - Minor issues detected (see issues array)
"unhealthy" - Critical issues requiring attention
List of detected health problems.Empty array when status is "healthy".Example issues:
"Jobs not started"
"Backfill stale (45m ago)"
"Queue stuck: 100 ready but worker idle"
"High disabled: 15/127 (12%)"
Background worker states.
Backfill job state (ensures all plannings have recent backups):
"starting" - Initializing
"idle" - Waiting for next cycle
"working" - Processing plannings
"paused" - Manually paused
"quiet_hours" - In quiet hours window
"stopped" - Stopped
"crashed" - Worker crashed
"unknown" - State unavailable
Refresh worker state (processes queue and retries failures):
- Same states as backfill worker
Whether the system is currently in quiet hours (reduced activity).Configured via JOBS_QUIET_HOURS (e.g., 21:00-06:00).
Refresh queue statistics.
Total items in queue (excluding exhausted retries)
Items ready to be processed (unlocked and past next attempt time)
Items currently being processed by workers
Backup coverage statistics.
Total number of plannings
Number of plannings with database backups
Number of plannings temporarily disabled due to repeated failures
Information about the most recent backup write operation.null if no backups have been written yet.
Full ID of the planning that was backed up
Whether the backup data changed from the previous version
Number of events in the backup
Timestamp when the backup was written (ISO 8601 format)
Top 5 hosts with the most disabled plannings.null if no hosts have failures.
Host name (e.g., "enscr", "fac-de-sciences")
Number of disabled plannings on this host
Most recent failure type:
"http_timeout" - Network timeout
"http_4xx" - Client error (planning likely doesn’t exist)
"http_5xx" - Server error
"parse_error" - ICS parsing failed
"network_error" - Other network error
5 most recently failed plannings.null if no recent failures.Show Recent failure fields
Type of failure (same values as failedHosts.lastFailure)
Consecutive failure count
When the planning will be re-enabled (ISO 8601 format).Uses exponential backoff: 5min, 15min, 1h, 4h, 12h, 24h (max)
Example requests
curl -H "x-ops-token: YOUR_OPS_TOKEN" \
https://planningsup.app/api/ops/plannings
Health status logic
The API computes health status based on several checks:
Check if jobs are running
If RUN_JOBS=false or jobs are paused, set status to degraded/unhealthy
Check worker staleness
- Backfill worker: Should run every
PLANNINGS_BACKFILL_INTERVAL_MS (default: 10 minutes)
- Refresh worker: Should poll every
PLANNINGS_REFRESH_WORKER_MAX_POLL_MS (default: 30 seconds)
If either worker hasn’t run in 2.5x its expected interval, mark as staleCheck queue health
If queue has many ready items but worker is idle, mark as stuck
Check disabled percentage
If more than 10% of plannings are disabled, mark as degraded
Determine overall status
- Healthy: No issues detected
- Degraded: Minor issues (high disabled percentage)
- Unhealthy: Critical issues (jobs not started, workers stale, queue stuck)
Error responses
Unauthorized
Status: 404 Not Found
Causes:
- Missing
x-ops-token header
- Invalid ops token
OPS_TOKEN not configured (in production)
In non-production environments without OPS_TOKEN set, the endpoint allows unauthenticated access for easier debugging.
Use cases
Monitoring dashboard
Alerting system
Debugging failures
Display system health in a monitoring UI:const health = await fetch('/api/ops/plannings', {
headers: { 'x-ops-token': token }
}).then(r => r.json())
return (
<div>
<StatusBadge status={health.status} />
<Metric label="Queue depth" value={health.queue.depth} />
<Metric label="Ready" value={health.queue.ready} />
<Metric label="Coverage"
value={`${health.backups.covered}/${health.backups.total}`} />
{health.issues.length > 0 && (
<Alert severity="error">
<ul>
{health.issues.map(issue => <li>{issue}</li>)}
</ul>
</Alert>
)}
</div>
)
Trigger alerts when system becomes unhealthy:async function checkHealth() {
const health = await fetch('/api/ops/plannings', {
headers: { 'x-ops-token': process.env.OPS_TOKEN }
}).then(r => r.json())
if (health.status === 'unhealthy') {
await sendAlert({
title: 'PlanningSup system unhealthy',
message: health.issues.join(', '),
severity: 'critical'
})
}
const disabledPercent = health.backups.disabled / health.backups.total
if (disabledPercent > 0.15) {
await sendAlert({
title: 'High planning failure rate',
message: `${health.backups.disabled}/${health.backups.total} plannings disabled`,
severity: 'warning'
})
}
}
// Run every 5 minutes
setInterval(checkHealth, 5 * 60 * 1000)
Investigate planning failures:const health = await fetch('/api/ops/plannings', {
headers: { 'x-ops-token': process.env.OPS_TOKEN }
}).then(r => r.json())
console.log('Failed hosts:')
for (const host of health.failedHosts || []) {
console.log(` ${host.host}: ${host.count} failures (${host.lastFailure})`)
}
console.log('\nRecent failures:')
for (const failure of health.recentFailures || []) {
console.log(` ${failure.planningFullId}:`)
console.log(` Kind: ${failure.failureKind}`)
console.log(` Count: ${failure.failures}`)
console.log(` Disabled until: ${failure.disabledUntil}`)
}
Next steps
Environment variables
Configure background jobs and operations
Architecture
Learn about the refresh system design