Skip to main content
The Monitor class collects and aggregates metrics from all orchestrator components, provides periodic health checks, and emits real-time metrics snapshots.

Monitor Class

Constructor

constructor(
  config: MonitorConfig,
  workerPool: WorkerPool,
  taskQueue: TaskQueue
)
config
MonitorConfig
required
Configuration for health checks and timeouts
workerPool
WorkerPool
required
Worker pool instance to monitor
taskQueue
TaskQueue
required
Task queue instance to track

Configuration

interface MonitorConfig {
  healthCheckInterval: number; // Polling interval in seconds
  workerTimeout: number;       // Worker timeout threshold in seconds
}

Methods

start()

Starts the monitoring loop with periodic health checks and metrics emission.
start(): void

stop()

Stops the monitoring loop and clears all timers.
stop(): void

getSnapshot()

Returns the current metrics snapshot.
getSnapshot(): MetricsSnapshot
MetricsSnapshot
object
Complete metrics snapshot with current state

Callback Registration

The monitor supports various callbacks for event-driven monitoring:
onTimeout(callback: (workerId: string, taskId: string) => void): void
onEmptyDiff(callback: (workerId: string, taskId: string) => void): void
onMetrics(callback: (snapshot: MetricsSnapshot) => void): void

Usage Example

import { Monitor } from "@longshot/orchestrator";

const monitor = new Monitor(
  { healthCheckInterval: 10, workerTimeout: 1800 },
  workerPool,
  taskQueue
);

// Register metrics callback
monitor.onMetrics((snapshot) => {
  console.log(`Active workers: ${snapshot.activeWorkers}`);
  console.log(`Pending tasks: ${snapshot.pendingTasks}`);
  console.log(`Tokens used: ${snapshot.totalTokensUsed}`);
});

// Register timeout callback
monitor.onTimeout((workerId, taskId) => {
  console.warn(`Worker ${workerId} timed out on task ${taskId}`);
});

monitor.start();

// Later...
monitor.stop();

Metrics Collection

The monitor automatically collects:
  • Task statistics: Pending, running, completed, failed counts
  • Worker statistics: Active workers, available capacity
  • Token usage: Total tokens consumed across all LLM calls
  • Cost tracking: Estimated USD cost based on token usage
  • Merge statistics: Merge attempts, successes, conflict rate
  • Finalization metrics: Build/test results from final sweep

Health Checks

Periodic health checks detect:
  • Worker timeouts: Workers that exceed workerTimeout threshold
  • Empty diffs: Workers that complete with no file changes
  • Suspicious tasks: Tasks with anomalous behavior patterns

Build docs developers (and LLMs) love