Operations API

The Operations API provides health monitoring and diagnostics for the planning refresh system.

All operations endpoints require the x-ops-token header. In production, requests without a valid token return 404.

Get planning system health

Get comprehensive health status of the planning refresh system including queue statistics, backup coverage, and failure tracking.

GET /api/ops/plannings

Headers

x-ops-token

string

required

Operations token for admin access.Must match the OPS_TOKEN environment variable.

curl -H "x-ops-token: YOUR_OPS_TOKEN" \
  https://planningsup.app/api/ops/plannings

Response

{
  "status": "healthy",
  "issues": [],
  "workers": {
    "backfill": "idle",
    "refreshWorker": "working"
  },
  "inQuietHours": false,
  "queue": {
    "depth": 15,
    "ready": 3,
    "locked": 1
  },
  "backups": {
    "total": 127,
    "covered": 125,
    "disabled": 2
  },
  "lastBackupWrite": {
    "planningFullId": "enscr.elevesing1iereannee",
    "changed": true,
    "nbEvents": 42,
    "at": "2026-03-02T10:30:00.000Z"
  },
  "failedHosts": [
    {
      "host": "fac-de-sciences",
      "count": 2,
      "lastFailure": "http_timeout"
    }
  ],
  "recentFailures": [
    {
      "planningFullId": "fac-de-sciences.master.m1info",
      "failureKind": "http_timeout",
      "failures": 3,
      "disabledUntil": "2026-03-02T11:00:00.000Z"
    }
  ]
}

Response fields

status

string

required

Overall system health status:

"healthy" - All systems functioning normally
"degraded" - Minor issues detected (see issues array)
"unhealthy" - Critical issues requiring attention

issues

array

required

List of detected health problems.Empty array when status is "healthy".Example issues:

"Jobs not started"
"Backfill stale (45m ago)"
"Queue stuck: 100 ready but worker idle"
"High disabled: 15/127 (12%)"

workers

object

required

Background worker states.

Show Worker fields

backfill

string

required

Backfill job state (ensures all plannings have recent backups):

"starting" - Initializing
"idle" - Waiting for next cycle
"working" - Processing plannings
"paused" - Manually paused
"quiet_hours" - In quiet hours window
"stopped" - Stopped
"crashed" - Worker crashed
"unknown" - State unavailable

refreshWorker

string

required

Refresh worker state (processes queue and retries failures):

Same states as backfill worker

inQuietHours

boolean

required

Whether the system is currently in quiet hours (reduced activity).Configured via JOBS_QUIET_HOURS (e.g., 21:00-06:00).

queue

object

required

Refresh queue statistics.

Show Queue fields

depth

number

required

Total items in queue (excluding exhausted retries)

ready

number

required

Items ready to be processed (unlocked and past next attempt time)

locked

number

required

Items currently being processed by workers

backups

object

required

Backup coverage statistics.

Show Backup fields

total

number

required

Total number of plannings

covered

number

required

Number of plannings with database backups

disabled

number

required

Number of plannings temporarily disabled due to repeated failures

lastBackupWrite

object | null

required

Information about the most recent backup write operation.null if no backups have been written yet.

Show Backup write fields

planningFullId

string

required

Full ID of the planning that was backed up

changed

boolean

required

Whether the backup data changed from the previous version

nbEvents

number

required

Number of events in the backup

string

required

Timestamp when the backup was written (ISO 8601 format)

failedHosts

array | null

required

Top 5 hosts with the most disabled plannings.null if no hosts have failures.

Show Failed host fields

host

string

required

Host name (e.g., "enscr", "fac-de-sciences")

count

number

required

Number of disabled plannings on this host

lastFailure

string | null

required

Most recent failure type:

"http_timeout" - Network timeout
"http_4xx" - Client error (planning likely doesn’t exist)
"http_5xx" - Server error
"parse_error" - ICS parsing failed
"network_error" - Other network error

recentFailures

array | null

required

5 most recently failed plannings.null if no recent failures.

Show Recent failure fields

planningFullId

string

required

Full planning ID

failureKind

string | null

required

Type of failure (same values as failedHosts.lastFailure)

failures

number

required

Consecutive failure count

disabledUntil

string | null

required

When the planning will be re-enabled (ISO 8601 format).Uses exponential backoff: 5min, 15min, 1h, 4h, 12h, 24h (max)

Example requests

curl -H "x-ops-token: YOUR_OPS_TOKEN" \
  https://planningsup.app/api/ops/plannings

Health status logic

The API computes health status based on several checks:

Check if jobs are running

If RUN_JOBS=false or jobs are paused, set status to degraded/unhealthy

Check worker staleness

Backfill worker: Should run every PLANNINGS_BACKFILL_INTERVAL_MS (default: 10 minutes)
Refresh worker: Should poll every PLANNINGS_REFRESH_WORKER_MAX_POLL_MS (default: 30 seconds)

If either worker hasn’t run in 2.5x its expected interval, mark as stale

Check queue health

If queue has many ready items but worker is idle, mark as stuck

Check disabled percentage

If more than 10% of plannings are disabled, mark as degraded

Determine overall status

Healthy: No issues detected
Degraded: Minor issues (high disabled percentage)
Unhealthy: Critical issues (jobs not started, workers stale, queue stuck)

Error responses

Unauthorized

{
  "error": "NOT_FOUND"
}

Status: 404 Not Found Causes:

Missing x-ops-token header
Invalid ops token
OPS_TOKEN not configured (in production)

In non-production environments without OPS_TOKEN set, the endpoint allows unauthenticated access for easier debugging.

Use cases

Monitoring dashboard
Alerting system
Debugging failures

Display system health in a monitoring UI:

const health = await fetch('/api/ops/plannings', {
  headers: { 'x-ops-token': token }
}).then(r => r.json())

return (
  <div>
    <StatusBadge status={health.status} />
    <Metric label="Queue depth" value={health.queue.depth} />
    <Metric label="Ready" value={health.queue.ready} />
    <Metric label="Coverage" 
      value={`${health.backups.covered}/${health.backups.total}`} />
    
    {health.issues.length > 0 && (
      <Alert severity="error">
        <ul>
          {health.issues.map(issue => <li>{issue}</li>)}
        </ul>
      </Alert>
    )}
  </div>
)

Trigger alerts when system becomes unhealthy:

async function checkHealth() {
  const health = await fetch('/api/ops/plannings', {
    headers: { 'x-ops-token': process.env.OPS_TOKEN }
  }).then(r => r.json())
  
  if (health.status === 'unhealthy') {
    await sendAlert({
      title: 'PlanningSup system unhealthy',
      message: health.issues.join(', '),
      severity: 'critical'
    })
  }
  
  const disabledPercent = health.backups.disabled / health.backups.total
  if (disabledPercent > 0.15) {
    await sendAlert({
      title: 'High planning failure rate',
      message: `${health.backups.disabled}/${health.backups.total} plannings disabled`,
      severity: 'warning'
    })
  }
}

// Run every 5 minutes
setInterval(checkHealth, 5 * 60 * 1000)

Investigate planning failures:

const health = await fetch('/api/ops/plannings', {
  headers: { 'x-ops-token': process.env.OPS_TOKEN }
}).then(r => r.json())

console.log('Failed hosts:')
for (const host of health.failedHosts || []) {
  console.log(`  ${host.host}: ${host.count} failures (${host.lastFailure})`)
}

console.log('\nRecent failures:')
for (const failure of health.recentFailures || []) {
  console.log(`  ${failure.planningFullId}:`)
  console.log(`    Kind: ${failure.failureKind}`)
  console.log(`    Count: ${failure.failures}`)
  console.log(`    Disabled until: ${failure.disabledUntil}`)
}

Setup

Contributing

API Reference

Get planning system health

Headers

Response

Response fields

Example requests

Health status logic

Error responses

Unauthorized

Use cases

Next steps

Environment variables

Architecture

Build docs developers (and LLMs) love

Setup

Contributing

API Reference

​Get planning system health

​Headers

​Response

​Response fields

​Example requests

​Health status logic

​Error responses

​Unauthorized

​Use cases

​Next steps

Environment variables

Architecture

Build docs developers (and LLMs) love

Get planning system health

Headers

Response

Response fields

Example requests

Health status logic

Error responses

Unauthorized

Use cases

Next steps