Overview
Loom’s cron monitoring system tracks scheduled jobs, background tasks, and cron jobs to detect missed runs, failures, and performance degradation. Supports both simple ping-based monitoring (like Healthchecks.io) and SDK-based check-ins (like Sentry Crons).
Cron monitoring integrates with crash tracking to link failed check-ins to crash events.
Key Features
Dual integration : Simple HTTP pings for shell scripts, SDK for application code
Missed run detection with configurable grace periods
Timeout detection for long-running jobs
Schedule support for cron expressions and fixed intervals
Real-time alerts via SSE streaming
Core Concepts
Monitor
A monitored job or scheduled task:
pub struct Monitor {
pub id : MonitorId ,
pub slug : String , // "daily-cleanup"
pub name : String , // "Daily Cleanup Job"
pub status : MonitorStatus ,
pub health : MonitorHealth ,
pub schedule : MonitorSchedule ,
pub timezone : String , // "America/New_York"
pub checkin_margin_minutes : u32 , // Grace period (default: 5)
pub max_runtime_minutes : Option < u32 >,
pub ping_key : String , // UUID for /ping/{key}
}
Monitor statuses:
Status Description ActiveMonitoring enabled PausedTemporarily disabled (won’t alert) DisabledFully disabled
Monitor health:
Health Description HealthyRecent check-in was OK FailingRecent check-in was Error MissedExpected check-in didn’t arrive TimeoutJob exceeded max_runtime UnknownNo check-ins yet
Check-in
A single job execution report:
pub struct CheckIn {
pub id : CheckInId ,
pub monitor_id : MonitorId ,
pub status : CheckInStatus ,
pub started_at : Option < DateTime < Utc >>,
pub finished_at : DateTime < Utc >,
pub duration_ms : Option < u64 >,
pub exit_code : Option < i32 >,
pub output : Option < String >, // Max 10KB
pub source : CheckInSource ,
}
Check-in statuses:
Status Description InProgressJob started, not finished OkCompleted successfully ErrorFailed (explicit error) MissedSystem-generated (didn’t arrive) TimeoutSystem-generated (max_runtime exceeded)
Schedule Types
pub enum MonitorSchedule {
// Cron expression: "0 0 * * *" (daily at midnight)
Cron { expression : String },
// Fixed interval: every 30 minutes
Interval { minutes : u32 },
}
Ping-Based Monitoring
For shell scripts and simple integrations:
Simple Ping URLs
# Success ping (job completed OK)
curl https://loom.example.com/ping/abc123-def456
# Start ping (job starting)
curl https://loom.example.com/ping/abc123-def456/start
# Fail ping (job failed)
curl https://loom.example.com/ping/abc123-def456/fail
# Ping with exit code
curl "https://loom.example.com/ping/abc123-def456?exit_code=1"
# Ping with output (POST)
curl -X POST https://loom.example.com/ping/abc123-def456 \
-d "Job completed. Processed 1000 records."
Shell Script Integration
Simple
With Error Handling
Kubernetes CronJob
#!/bin/bash
set -e
# Run job
/usr/local/bin/my-script.sh
# Success ping
curl -fsS https://loom.example.com/ping/xxx
#!/bin/bash
set -e
# Signal start
curl -fsS https://loom.example.com/ping/xxx/start
# Run job
if /usr/local/bin/my-script.sh 2>&1 | tee /tmp/job-output.txt ; then
# Success
curl -fsS -X POST https://loom.example.com/ping/xxx \
-d @/tmp/job-output.txt
else
# Failure
curl -fsS -X POST "https://loom.example.com/ping/xxx/fail?exit_code= $? " \
-d @/tmp/job-output.txt
fi
apiVersion : batch/v1
kind : CronJob
metadata :
name : daily-cleanup
spec :
schedule : "0 0 * * *"
jobTemplate :
spec :
template :
spec :
containers :
- name : cleanup
image : my-app:latest
command :
- /bin/sh
- -c
- |
curl -fsS https://loom.example.com/ping/xxx/start
if /app/cleanup.sh; then
curl -fsS https://loom.example.com/ping/xxx
else
curl -fsS https://loom.example.com/ping/xxx/fail
fi
restartPolicy : OnFailure
SDK-Based Monitoring
For application code:
Rust SDK
[ dependencies ]
loom-crons = { version = "0.1" }
loom-crash = { version = "0.1" } # Optional
use loom_crons :: { CronsClient , CheckInOk , CheckInError };
// Initialize
let crons = CronsClient :: builder ()
. api_key ( "loom_crons_xxx" )
. base_url ( "https://loom.example.com" )
. crash_client ( & crash ) // Optional: link errors to crashes
. build () ? ;
// Manual check-in pattern
let checkin_id = crons . checkin_start ( "daily-cleanup" ) . await ? ;
match run_daily_cleanup () {
Ok ( result ) => {
crons . checkin_ok ( checkin_id , CheckInOk {
duration_ms : Some ( elapsed . as_millis () as u64 ),
output : Some ( format! ( "Processed {} records" , result . count)),
}) . await ? ;
}
Err ( e ) => {
let crash_id = crash . capture_error ( & e ) . await ? ;
crons . checkin_error ( checkin_id , CheckInError {
duration_ms : Some ( elapsed . as_millis () as u64 ),
exit_code : Some ( 1 ),
output : Some ( e . to_string ()),
crash_event_id : Some ( crash_id ),
}) . await ? ;
}
}
// Convenience wrapper
crons . with_monitor ( "daily-cleanup" , || async {
run_daily_cleanup () . await
}) . await ? ;
TypeScript SDK
import { CronsClient } from '@loom/crons' ;
import { CrashClient } from '@loom/crash' ;
const crons = new CronsClient ({
apiKey: 'loom_crons_xxx' ,
baseUrl: 'https://loom.example.com' ,
crashClient: crash , // Optional
});
// Manual pattern
const checkinId = await crons . checkinStart ( 'email-digest' );
try {
await sendEmailDigest ();
await crons . checkinOk ( checkinId , {
durationMs: Date . now () - startTime ,
output: 'Sent 150 emails' ,
});
} catch ( error ) {
const crashId = await crash . captureException ( error );
await crons . checkinError ( checkinId , {
durationMs: Date . now () - startTime ,
output: error . message ,
crashEventId: crashId ,
});
}
// Convenience wrapper
await crons . withMonitor ( 'email-digest' , async () => {
await sendEmailDigest ();
});
Creating Monitors
Create via API:
curl -X POST https://loom.example.com/api/crons/monitors \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"slug": "daily-cleanup",
"name": "Daily Cleanup Job",
"schedule": {
"type": "cron",
"expression": "0 0 * * *"
},
"timezone": "America/New_York",
"checkin_margin_minutes": 5,
"max_runtime_minutes": 60,
"environments": ["production"]
}'
Response:
{
"monitor" : {
"id" : "mon_xxx" ,
"slug" : "daily-cleanup" ,
"ping_url" : "https://loom.example.com/ping/abc123-def456" ,
"status" : "active" ,
"health" : "unknown"
}
}
Cron Expressions
Standard 5-field format:
┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday = 0)
│ │ │ │ │
* * * * *
Examples:
Expression Description 0 0 * * *Daily at midnight */15 * * * *Every 15 minutes 0 9 * * 1-59am on weekdays 0 0 1 * *First of every month 30 2 * * 02:30am on Sundays
Grace Period & Timeouts
Grace Period
The checkin_margin_minutes provides tolerance:
Expected at: 00:00:00
Grace: 5 minutes
├──────────────────────┬───────────────────────┤
│ On-time window │ Grace period │
│ 00:00:00 │ 00:00:00 - 00:05:00 │
├──────────────────────┴───────────────────────┤
│ After 00:05:00 = MISSED │
└──────────────────────────────────────────────┘
Timeout Detection
If max_runtime_minutes is set, jobs exceeding this are marked as timeout:
// Monitor with 60-minute timeout
Monitor {
max_runtime_minutes : Some ( 60 ),
...
}
// Job running for 61 minutes → status: Timeout
API Endpoints
Ping Endpoints (No Auth)
Job starting Response: {
"checkin_id" : "chk_xxx"
}
Job failed Query Parameters:
exit_code - Exit code (optional)
Ping with output body Body: Plain text output (max 10KB)
Monitor Management
List monitors Query Parameters:
status - Filter by status (active, paused, disabled)
health - Filter by health (healthy, failing, missed)
Create monitor Request: {
"slug" : "daily-backup" ,
"name" : "Daily Database Backup" ,
"schedule" : {
"type" : "interval" ,
"minutes" : 1440
},
"timezone" : "UTC"
}
PATCH /api/crons/monitors/:slug
Update monitor
POST /api/crons/monitors/:slug/pause
Pause monitoring (won’t alert on missed runs)
POST /api/crons/monitors/:slug/resume
Resume monitoring
Check-in Management
POST /api/crons/monitors/:slug/checkins
Create check-in (SDK) Request: {
"status" : "in_progress" ,
"started_at" : "2026-01-18T12:00:00Z"
}
PATCH /api/crons/checkins/:id
Update check-in Request: {
"status" : "ok" ,
"finished_at" : "2026-01-18T12:05:00Z" ,
"duration_ms" : 300000 ,
"output" : "Processed 1000 records"
}
GET /api/crons/monitors/:slug/checkins
List check-ins for monitor Query Parameters:
limit - Max results (default: 100)
status - Filter by status
Real-time Updates (SSE)
Subscribe to monitor events:
const eventSource = new EventSource (
'https://loom.example.com/api/crons/stream' ,
{ headers: { Authorization: 'Bearer <token>' } }
);
eventSource . addEventListener ( 'monitor.missed' , ( event ) => {
const data = JSON . parse ( event . data );
console . log ( `Monitor ${ data . monitor_name } missed check-in!` );
sendAlert ( data );
});
Events:
checkin.started - Job started
checkin.ok - Job completed successfully
checkin.error - Job failed
monitor.missed - Expected check-in didn’t arrive
monitor.timeout - Job exceeded max_runtime
monitor.healthy - Monitor recovered from failure
Statistics
Get monitor stats:
curl https://loom.example.com/api/crons/monitors/daily-cleanup/stats?period=week
Response:
{
"total_checkins" : 168 ,
"successful_checkins" : 165 ,
"failed_checkins" : 2 ,
"missed_checkins" : 1 ,
"timeout_checkins" : 0 ,
"avg_duration_ms" : 45000 ,
"p95_duration_ms" : 120000 ,
"uptime_percentage" : 98.2
}
Best Practices
Set Grace Periods Use checkin_margin_minutes to avoid false alerts for jobs that vary slightly in timing
Monitor Timeouts Set max_runtime_minutes to detect hung jobs
Environment Filters Use environments to only monitor production runs
Capture Output POST output to ping URLs for debugging failed runs
See Also
Crash Tracking Link failed jobs to crash events
Analytics Track job execution patterns
Sessions Monitor job session health