Skip to main content
The Operations API provides health monitoring and diagnostics for the planning refresh system.
All operations endpoints require the x-ops-token header. In production, requests without a valid token return 404.

Get planning system health

Get comprehensive health status of the planning refresh system including queue statistics, backup coverage, and failure tracking.
GET /api/ops/plannings

Headers

x-ops-token
string
required
Operations token for admin access.Must match the OPS_TOKEN environment variable.
curl -H "x-ops-token: YOUR_OPS_TOKEN" \
  https://planningsup.app/api/ops/plannings

Response

{
  "status": "healthy",
  "issues": [],
  "workers": {
    "backfill": "idle",
    "refreshWorker": "working"
  },
  "inQuietHours": false,
  "queue": {
    "depth": 15,
    "ready": 3,
    "locked": 1
  },
  "backups": {
    "total": 127,
    "covered": 125,
    "disabled": 2
  },
  "lastBackupWrite": {
    "planningFullId": "enscr.elevesing1iereannee",
    "changed": true,
    "nbEvents": 42,
    "at": "2026-03-02T10:30:00.000Z"
  },
  "failedHosts": [
    {
      "host": "fac-de-sciences",
      "count": 2,
      "lastFailure": "http_timeout"
    }
  ],
  "recentFailures": [
    {
      "planningFullId": "fac-de-sciences.master.m1info",
      "failureKind": "http_timeout",
      "failures": 3,
      "disabledUntil": "2026-03-02T11:00:00.000Z"
    }
  ]
}

Response fields

status
string
required
Overall system health status:
  • "healthy" - All systems functioning normally
  • "degraded" - Minor issues detected (see issues array)
  • "unhealthy" - Critical issues requiring attention
issues
array
required
List of detected health problems.Empty array when status is "healthy".Example issues:
  • "Jobs not started"
  • "Backfill stale (45m ago)"
  • "Queue stuck: 100 ready but worker idle"
  • "High disabled: 15/127 (12%)"
workers
object
required
Background worker states.
inQuietHours
boolean
required
Whether the system is currently in quiet hours (reduced activity).Configured via JOBS_QUIET_HOURS (e.g., 21:00-06:00).
queue
object
required
Refresh queue statistics.
backups
object
required
Backup coverage statistics.
lastBackupWrite
object | null
required
Information about the most recent backup write operation.null if no backups have been written yet.
failedHosts
array | null
required
Top 5 hosts with the most disabled plannings.null if no hosts have failures.
recentFailures
array | null
required
5 most recently failed plannings.null if no recent failures.

Example requests

curl -H "x-ops-token: YOUR_OPS_TOKEN" \
  https://planningsup.app/api/ops/plannings

Health status logic

The API computes health status based on several checks:
1

Check if jobs are running

If RUN_JOBS=false or jobs are paused, set status to degraded/unhealthy
2

Check worker staleness

  • Backfill worker: Should run every PLANNINGS_BACKFILL_INTERVAL_MS (default: 10 minutes)
  • Refresh worker: Should poll every PLANNINGS_REFRESH_WORKER_MAX_POLL_MS (default: 30 seconds)
If either worker hasn’t run in 2.5x its expected interval, mark as stale
3

Check queue health

If queue has many ready items but worker is idle, mark as stuck
4

Check disabled percentage

If more than 10% of plannings are disabled, mark as degraded
5

Determine overall status

  • Healthy: No issues detected
  • Degraded: Minor issues (high disabled percentage)
  • Unhealthy: Critical issues (jobs not started, workers stale, queue stuck)

Error responses

Unauthorized

{
  "error": "NOT_FOUND"
}
Status: 404 Not Found Causes:
  • Missing x-ops-token header
  • Invalid ops token
  • OPS_TOKEN not configured (in production)
In non-production environments without OPS_TOKEN set, the endpoint allows unauthenticated access for easier debugging.

Use cases

Display system health in a monitoring UI:
const health = await fetch('/api/ops/plannings', {
  headers: { 'x-ops-token': token }
}).then(r => r.json())

return (
  <div>
    <StatusBadge status={health.status} />
    <Metric label="Queue depth" value={health.queue.depth} />
    <Metric label="Ready" value={health.queue.ready} />
    <Metric label="Coverage" 
      value={`${health.backups.covered}/${health.backups.total}`} />
    
    {health.issues.length > 0 && (
      <Alert severity="error">
        <ul>
          {health.issues.map(issue => <li>{issue}</li>)}
        </ul>
      </Alert>
    )}
  </div>
)

Next steps

Environment variables

Configure background jobs and operations

Architecture

Learn about the refresh system design

Build docs developers (and LLMs) love