Skip to main content
Dispatch provides robust error handling and recovery mechanisms. Workers can ask questions, mark blockers, and recover from failures — all without losing context.

Understanding Checklist Markers

The plan file uses four markers to track progress:
Meaning: Item completed successfullyExample:
- [x] Scan for hardcoded secrets — found 2 instances in config.ts
- [x] Review authentication logic — JWT implementation looks good
Workers can add notes after the marker to provide context.

Blocked Items and Question Flow

When a worker needs clarification, it uses the IPC (Inter-Process Communication) system to ask questions without losing context.

Primary Flow: IPC (Monitor-Triggered)

This is the preferred method — the worker stays alive while waiting for an answer.

1. Worker Asks a Question

Worker encounters a blocker and writes a question:
# Worker writes atomically
echo "Found both JWT and session-based auth. Which should I focus on?" > \
  .dispatch/tasks/security-review/ipc/001.question.tmp
mv .dispatch/tasks/security-review/ipc/001.question.tmp \
  .dispatch/tasks/security-review/ipc/001.question
The worker then polls for .dispatch/tasks/security-review/ipc/001.answer.

2. Monitor Detects the Question

A lightweight monitor script polls the IPC directory:
# Monitor checks for unanswered questions
for q in .dispatch/tasks/security-review/ipc/*.question; do
  seq=$(basename "$q" .question)
  if [ ! -f ".dispatch/tasks/security-review/ipc/${seq}.answer" ]; then
    exit 0  # Triggers <task-notification>
  fi
done
When it finds an unanswered question, it exits — triggering a notification.

3. Dispatcher Surfaces the Question

The dispatcher reads the question and shows it to you:
Worker is asking: "Found both JWT and session-based auth. Which should I focus on?"

4. You Answer

You provide an answer:
Focus on JWT — session-based auth is deprecated.

5. Dispatcher Writes the Answer

The answer is written atomically:
echo "Focus on JWT — session-based auth is deprecated." > \
  .dispatch/tasks/security-review/ipc/001.answer.tmp
mv .dispatch/tasks/security-review/ipc/001.answer.tmp \
  .dispatch/tasks/security-review/ipc/001.answer

6. Worker Continues

Worker detects the answer, acknowledges it, and continues:
# Worker acknowledges
touch .dispatch/tasks/security-review/ipc/001.done

# Continues with the task using the answer
Critical: The worker never lost context. It stayed alive the entire time, preserving all state in memory.

Fallback Flow: Context Dump ([?] Marker)

If no answer arrives within ~3 minutes, the worker falls back to the legacy behavior:
  1. Dumps context to .dispatch/tasks/<task-id>/context.md
  2. Marks the item [?] with the question
  3. Exits cleanly
Example plan file after timeout:
# Security Review

- [x] Scan for hardcoded secrets — found 2 instances
- [?] Review authentication logic

> **Blocker:** Found both JWT and session-based auth. Which should I focus on?
You can answer and re-dispatch:
/dispatch continue security-review from the blocked item. 
Answer: Focus on JWT session-based auth is deprecated.
The new worker:
  1. Reads the plan file
  2. Reads context.md (preserves the previous worker’s state)
  3. Uses your answer to continue from the blocked item

Proactive Recovery When Workers Fail

When a worker fails to start or errors immediately, Dispatch proactively offers recovery.

Scenario: CLI Unavailable

/dispatch use gpt-5.3-codex to review the auth module
Worker fails:
Error: codex: command not found
Dispatcher’s proactive recovery:
The Codex CLI is no longer available. Alternatives from your config:

1. opus (Claude)
2. sonnet (Claude)  
3. gemini-3.1-pro (Cursor)

Want me to switch your default and retry with one of these?
You choose:
Use opus
Dispatcher updates config and retries:
# ~/.dispatch/config.yaml
default: opus  # Updated from gpt-5.3-codex

models:
  opus:   { backend: claude }
  sonnet: { backend: claude }
  # Removed: gpt-5.3-codex (CLI unavailable)
Updated default to opus. Retrying security-review...
Dispatched security-review using opus.

Scenario: Auth Error

/dispatch use claude-opus-4.5 to do a security review
Worker fails:
Error: Authentication failed. Claude Opus 4.5 requires upgraded subscription.
Dispatcher’s proactive recovery:
Claude Opus 4.5 returned an auth error. Alternatives:

1. sonnet (Claude) — works on your plan
2. gemini-3.1-pro (Cursor)

Switch to one of these and retry?
You choose:
Sonnet
Dispatcher updates config and retries:
# Removes the broken model to prevent future failures
models:
  sonnet: { backend: claude }
  haiku:  { backend: claude }
  # Removed: claude-opus-4.5 (auth failed)

Timeout Behavior

Workers and monitors have timeouts to prevent hanging indefinitely.

Monitor Timeout

The monitor script times out after 30 minutes:
TIMEOUT=1800  # 30 minutes
START=$(date +%s)

while true; do
  # Check for .done or questions...
  
  ELAPSED=$(( $(date +%s) - START ))
  [ "$ELAPSED" -ge "$TIMEOUT" ] && exit 1
  
  sleep 3
done
If a worker is still running after 30 minutes without progress, the monitor exits with error.

IPC Answer Timeout

Workers wait ~3 minutes for an answer to their question. If no answer arrives:
  1. Worker dumps context to context.md
  2. Marks the item [?] with the question
  3. Exits cleanly
This prevents workers from hanging indefinitely when you’re away.

Handling Multiple Failures

When multiple parallel workers fail, Dispatch handles each independently:
Error report:

● security-review (opus) — FAILED
  Worker exited with error on item 2:
  [!] Review authentication logic
  > Error: auth/ directory not found

● api-tests (gpt-5.3-codex) — FAILED
  Worker failed to start: codex: command not found

● update-readme (gemini) — COMPLETE
  ✓ All items checked
Dispatcher’s recovery:
1. security-review failed due to missing directory. Fix it and I can retry.
2. api-tests failed because Codex CLI is unavailable. Switch to opus or sonnet?
3. update-readme completed successfully.
You respond:
Retry api-tests with sonnet
Dispatcher updates and retries:
Updated api-tests to use sonnet. Dispatching...
Dispatched api-tests using sonnet.

Error Recovery Best Practices

Answer Questions Quickly

Workers stay alive for ~3 minutes waiting for answers. Respond within this window to avoid context loss.

Review Context Dumps

If a worker times out, check context.md before re-dispatching. It contains the worker’s state before exit.

Fix Root Causes

If a worker fails due to missing files or broken CLI, fix the underlying issue before retrying.

Use Aliases for Fallbacks

Create aliases with fallback models so you can quickly switch when a primary model fails.
See the Configuration guide for managing model fallbacks and the Parallel Tasks guide for handling errors in concurrent workers.

Build docs developers (and LLMs) love