Worker Agent - Longshot

The worker is the only agent that writes code. It receives a task, implements it on an isolated branch in a sandbox environment, and returns a detailed handoff report.

Core Workflow

Workers follow a strict Plan → Execute → Verify → Commit cycle:

1. Plan (Before Writing Code)

Read the task description and acceptance criteria completely
Explore relevant files — read the code in scope, search for patterns
Form a concrete approach: what to change, what to create, what to call

2. Execute

Implement the solution
After each significant change, immediately verify:
- Compile: npx tsc --noEmit
- Run relevant tests if they exist
Fix before continuing — do not accumulate unverified changes

3. Reflect (After Every Significant Change)

Before moving to full verification, pause and check:

Am I still solving the task described in the acceptance criteria?
Have I drifted into fixing things outside my scope?
Is my approach consistent with the patterns I found during exploration?

Scope creep is the #1 worker failure mode.

4. Verify (Multi-Pass)

After implementation is complete, run the full verification cycle:

Compile the project
Run all tests in scope
Check for edge cases the task description may not have mentioned
Look for similar patterns elsewhere that your change should also address

If anything fails, fix it and re-verify. Up to 3 full fix cycles. After 3 failures, report as “blocked.”

5. Commit and Handoff

Commit all work to your branch
Write a thorough handoff (see Handoff Protocol)

Implementation

Location: packages/sandbox/src/worker-runner.ts

Sandbox Execution

Workers run in isolated Modal sandboxes with full filesystem access:

export async function runWorker(): Promise<void> {
  const startTime = Date.now();

  // 1. Read task payload from /workspace/task.json
  const raw = readFileSync(TASK_PATH, "utf-8");
  const payload: TaskPayload = JSON.parse(raw);
  const { task, systemPrompt, llmConfig } = payload;

  // 2. Enable distributed tracing
  enableTracing("/workspace");
  let workerSpan: Span | undefined;
  if (payload.trace) {
    const tracer = Tracer.fromPropagated(payload.trace);
    workerSpan = tracer.startSpan("sandbox.worker", {
      taskId: task.id,
      agentId: `sandbox-${task.id}`,
    });
  }

  // 3. Write worker instructions as AGENTS.md in parent dir
  //    (Pi auto-discovers AGENTS.md by walking up from cwd)
  if (systemPrompt) {
    writeFileSync(WORKER_AGENTS_MD_PATH, systemPrompt, "utf-8");
  }

  // 4. Register LLM model with Pi's ModelRegistry
  const authStorage = Reflect.construct(AuthStorage, []) as AuthStorage;
  const modelRegistry = new ModelRegistry(authStorage);
  modelRegistry.registerProvider("glm5", {
    baseUrl: llmConfig.endpoint,
    apiKey: llmConfig.apiKey || "no-key-needed",
    api: "openai-completions",
    models: [{
      id: llmConfig.model,
      name: llmConfig.model,
      reasoning: false,
      input: ["text"],
      cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
      contextWindow: 131072,
      maxTokens: llmConfig.maxTokens,
    }],
  });

  const model = modelRegistry.find("glm5", llmConfig.model);
  if (!model) {
    throw new Error(`Model "${llmConfig.model}" not found in registry`);
  }

  const startSha = safeExec("git rev-parse HEAD", WORK_DIR);

  // 5. Create Pi agent session with FULL tool suite
  const { session } = await createAgentSession({
    cwd: WORK_DIR,
    model,
    tools: fullPiTools,  // [read, write, edit, bash, grep, find, ls]
    authStorage,
    modelRegistry,
    sessionManager: SessionManager.inMemory(),
    settingsManager: SettingsManager.inMemory(),
    thinkingLevel: "off",
  });

  let toolCallCount = 0;
  let lastAssistantMessage = "";

  // 6. Subscribe to Pi events to track tool calls
  session.subscribe((event: unknown) => {
    if (event.type === "tool_execution_start") {
      toolCallCount++;
    }
    if (event.type === "message_end" && event.message.role === "assistant") {
      // Extract final assistant text
      lastAssistantMessage = extractTextFromContent(event.message.content);
    }
  });

  // 7. Prompt agent with task
  const prompt = buildTaskPrompt(task);
  await session.prompt(prompt);

  const stats = session.getSessionStats();
  const tokensUsed = stats.tokens.total;
  session.dispose();

  // 8. Detect empty LLM responses (bug fix)
  const isEmptyResponse = tokensUsed === 0 && toolCallCount === 0;
  if (isEmptyResponse) {
    log("WARNING: LLM returned empty response (0 tokens, 0 tool calls). Marking as failed.");
  }

  // 9. Ensure .gitignore exists (scaffold safety)
  if (!existsSync(`${WORK_DIR}/.gitignore`)) {
    writeFileSync(`${WORK_DIR}/.gitignore`, GITIGNORE_ESSENTIALS, "utf-8");
  }

  // 10. Safety-net commit (only if agent did work)
  if (!isEmptyResponse) {
    safeExec("git add -A", WORK_DIR);
    const stagedFiles = safeExec("git diff --cached --name-only", WORK_DIR);
    if (stagedFiles) {
      safeExec(`git commit -m "feat(${task.id}): auto-commit uncommitted changes"`, WORK_DIR);
    }
  }

  // 11. Post-agent build check
  let buildExitCode: number | null = null;
  if (!isEmptyResponse && existsSync(`${WORK_DIR}/tsconfig.json`)) {
    try {
      execSync("npx tsc --noEmit", { cwd: WORK_DIR, encoding: "utf-8", timeout: 60_000 });
      buildExitCode = 0;
    } catch (buildErr: unknown) {
      buildExitCode = hasStatusCode(buildErr) ? buildErr.status : 1;
    }
  }

  // 12. Extract diff stats (exclude artifacts)
  const diff = safeExec(`git diff ${startSha} --no-color -- . ':!node_modules'`, WORK_DIR);
  const numstat = safeExec(`git diff ${startSha} --numstat`, WORK_DIR);
  const filesCreatedRaw = safeExec(`git diff ${startSha} --diff-filter=A --name-only`, WORK_DIR);
  const filesChangedRaw = safeExec(`git diff ${startSha} --name-only`, WORK_DIR);

  const filesChanged = filesChangedRaw.split("\n").filter(Boolean).filter((f) => !isArtifact(f));
  const filesCreated = filesCreatedRaw.split("\n").filter(Boolean).filter((f) => !isArtifact(f));

  // Parse numstat for line counts
  let linesAdded = 0;
  let linesRemoved = 0;
  if (numstat) {
    for (const line of numstat.split("\n")) {
      const [addedRaw, removedRaw, filePath] = line.split("\t");
      if (filePath && !isArtifact(filePath)) {
        linesAdded += parseInt(addedRaw, 10) || 0;
        linesRemoved += parseInt(removedRaw, 10) || 0;
      }
    }
  }

  // 13. Build handoff
  const handoff: Handoff = {
    taskId: task.id,
    status: isEmptyResponse ? "failed" : "complete",
    summary: isEmptyResponse
      ? "Task failed: LLM returned empty response (0 tokens, 0 tool calls). Possible API/endpoint failure."
      : lastAssistantMessage || "Task completed (no final message captured).",
    diff,
    filesChanged,
    concerns: isEmptyResponse
      ? ["Empty LLM response — possible API failure or model endpoint issue"]
      : buildExitCode !== null && buildExitCode !== 0
        ? [`Post-agent build check failed (tsc exit code ${buildExitCode})`]
        : [],
    suggestions: isEmptyResponse ? ["Check LLM endpoint connectivity"] : [],
    buildExitCode,
    metrics: {
      linesAdded,
      linesRemoved,
      filesCreated: filesCreated.length,
      filesModified: Math.max(0, filesChanged.length - filesCreated.length),
      tokensUsed,
      toolCallCount,
      durationMs: Date.now() - startTime,
    },
  };

  writeResult(handoff);
}

Full Pi Tool Suite

Workers get all 7 Pi tools, not just the limited 4-tool codingTools set:

import {
  codingTools,    // [read, bash, edit, write]
  grepTool,       // Ripgrep-powered content search
  findTool,       // Glob-based file search
  lsTool,         // Directory listing
} from "@mariozechner/pi-coding-agent";

const fullPiTools = [...codingTools, grepTool, findTool, lsTool];

Why this matters:

grep enables fast content search (e.g., “find all uses of this function”)
find enables file discovery (e.g., “find all test files”)
ls enables directory exploration before reading files

These tools dramatically improve agent autonomy compared to the minimal set.

Task Prompt Construction

export function buildTaskPrompt(task: Task): string {
  const parts: string[] = [
    `## Task: ${task.id}`,
    `**Description:** ${task.description}`,
    `**Scope (files to focus on):** ${task.scope.join(", ")}`,
    `**Acceptance criteria:** ${task.acceptance}`,
    `**Branch:** ${task.branch}`,
    "",
    "Complete this task. Commit your changes when done. Stay focused on the scoped files.",
  ];

  return parts.join("\n");
}

Artifact Filtering

Build artifacts are excluded from diff stats:

const ARTIFACT_PATTERNS = [
  /^node_modules\//,
  /^\.next\//,
  /^dist\//,
  /^build\//,
  /^out\//,
  /^\.turbo\//,
  /^\.tsbuildinfo$/,
  /^package-lock\.json$/,
  /^pnpm-lock\.yaml$/,
  /^yarn\.lock$/,
  /^\.pnpm-store\//,
];

function isArtifact(filePath: string): boolean {
  return ARTIFACT_PATTERNS.some((p) => p.test(filePath));
}

Prompt Engineering

Location: prompts/worker.md

1. Workflow: Plan → Execute → Verify

### 1. Plan (before writing any code)
- Read the task description and acceptance criteria completely.
- Explore relevant files — read the code in scope, search for patterns, understand how it connects.
- Form a concrete approach: what to change, what to create, what to call.

### 2. Execute
- Implement the solution.
- After each significant change (new function, modified interface, added file), immediately verify:
  - Compile: `npx tsc --noEmit` (or the project's build command)
  - Run relevant tests if they exist
- If verification fails, fix before continuing. Do not accumulate unverified changes.

### 2.5. Reflect (after every significant change)
Before moving to full verification, pause and check:
- Am I still solving the task described in the acceptance criteria?
- Have I drifted into fixing things outside my scope?
- Is my approach consistent with the patterns I found during exploration?

If you've drifted, stop and course-correct before writing more code. **Scope creep is the #1 worker failure mode.**

### 3. Verify (multi-pass)
- After implementation is complete, run the full verification cycle
- Up to 3 full fix cycles.

2. Non-Negotiable Constraints

- **NEVER leave TODOs, placeholder code, or partial implementations.** Every function must be complete and working.
- **NEVER modify files outside your task scope.** If scoped to `src/auth/token.ts` and `src/auth/middleware.ts`, touch nothing else.
- **NEVER delete or disable tests.** If a test fails, fix your code — not the test.
- **NEVER use `any` types, `@ts-ignore`, or `@ts-expect-error`.** Fix type errors properly.
- **NEVER leave empty catch blocks.** Handle errors meaningfully or let them propagate.
- **NEVER claim completion without running verification.** Compile + test = minimum bar.
- **NEVER continue past a failing compilation without fixing it.** Errors compound.
- **NEVER import dependencies not already in package.json** without noting it in handoff concerns.
- **ALWAYS commit before handoff.** All work must be saved to your branch.
- **3 failed fix cycles = stop.** Report as "blocked" with what you tried and what went wrong.

3. Code Quality

**The acceptance criteria are your contract.** They define exactly what "done" means — verification 
steps, test scenarios, integration points, edge cases, and patterns to follow. Meet every point.

Your code should be indistinguishable from what a staff engineer on the team would write. Match 
existing patterns in the repository — style, conventions, structure, error handling, naming. 
Blend in, don't impose.

4. The Handoff

Your handoff is the only way information flows back to the planner. A rich, detailed handoff 
directly improves future planning. Sparse handoffs waste system capacity.

ALWAYS report:
- What you actually did (not just what was asked)
- Deviations from the task description and why
- Concerns: code smells, potential bugs, fragile patterns, uncovered edge cases
- Findings: unexpected things discovered about the codebase
- Cross-agent issues: if other workers' changes appear broken or conflicting
- Feedback: if the task description was unclear or missing information

**Handoffs with empty concerns and suggestions are almost always wrong.** You should ALWAYS 
notice something worth mentioning, even if minor.

5. Status Meanings

- **complete** — every point in the acceptance criteria is met, code compiles, all specified tests exist and pass, edge cases handled
- **partial** — meaningful progress made but not fully done. Describe what remains.
- **blocked** — could not proceed after 3 fix cycles. Describe what you tried.
- **failed** — something went fundamentally wrong. Describe the failure.

Handoff Protocol

The handoff is the only communication channel from worker back to planner:

{
  "status": "complete | partial | blocked | failed",
  "summary": "What you did and how. 2-4 sentences.",
  "filesChanged": ["src/auth/token.ts", "src/auth/middleware.ts"],
  "concerns": ["Risks, unexpected findings, things that worry you"],
  "suggestions": ["Ideas for follow-up work"]
}

Good Handoff Example

Task: “Implement JWT token generation in src/auth/token.ts”

{
  "status": "complete",
  "summary": "Implemented generateAccessToken and generateRefreshToken in src/auth/token.ts. Both functions follow the config-loading pattern from password.ts (reading from process.env via src/config.ts). Added TOKEN_EXPIRY constants to the top of the file. Verified with npx tsc --noEmit — no errors.",
  "filesChanged": ["src/auth/token.ts"],
  "concerns": [
    "JWT_SECRET is read from process.env but never validated at startup. Other env vars (DB_HOST, DB_PORT) are validated in src/config.ts — JWT_SECRET should be too.",
    "The existing password.ts uses bcrypt synchronously (compareSync). Token generation uses async jwt.sign. Callers will need to handle the Promise."
  ],
  "suggestions": [
    "Add JWT_SECRET to the env validation in src/config.ts.",
    "A follow-up task should implement token refresh rotation — the current generateRefreshToken issues new tokens but there's no invalidation of old ones."
  ]
}

What makes this good:

States what patterns were followed (“config-loading pattern from password.ts”)
Flags missing validation (JWT_SECRET not validated at startup)
Identifies async/sync mismatch concern
Suggests concrete follow-up tasks

Bad Handoff Example

{
  "status": "complete",
  "summary": "Added token generation functions.",
  "filesChanged": ["src/auth/token.ts"],
  "concerns": [],
  "suggestions": []
}

What’s wrong: No mention of patterns, no concerns, no actionable feedback. The planner learns nothing.

Configuration

Sandbox Environment

const TASK_PATH = "/workspace/task.json";    // Task payload location
const RESULT_PATH = "/workspace/result.json";  // Handoff output location
const WORK_DIR = "/workspace/repo";            // Git repo working directory
const WORKER_AGENTS_MD_PATH = "/workspace/AGENTS.md";  // System prompt

Essential .gitignore

const GITIGNORE_ESSENTIALS = [
  "node_modules/",
  ".next/",
  "dist/",
  "build/",
  "out/",
  ".turbo/",
  "*.tsbuildinfo",
  ".pnpm-store/",
  "package-lock.json",
  "pnpm-lock.yaml",
  "yarn.lock",
].join("\n");

If .gitignore doesn’t exist, it’s created automatically with these essentials.

Bug Fixes and Safeguards

1. Empty Response Detection

Previous versions treated empty LLM responses (0 tokens, 0 tool calls) as “complete,” producing false-positive scaffold diffs:

const isEmptyResponse = tokensUsed === 0 && toolCallCount === 0;
if (isEmptyResponse) {
  log("WARNING: LLM returned empty response (0 tokens, 0 tool calls). Marking task as failed.");
}

const handoff: Handoff = {
  taskId: task.id,
  status: isEmptyResponse ? "failed" : "complete",
  summary: isEmptyResponse
    ? "Task failed: LLM returned empty response. Possible API/endpoint failure."
    : lastAssistantMessage,
  concerns: isEmptyResponse
    ? ["Empty LLM response — possible API failure"]
    : [],
};

2. Safety-Net Commit Guard

Previous versions always safety-net committed, even when the agent did nothing, producing scaffold-only commits:

// Only safety-net commit if agent actually did work
if (!isEmptyResponse) {
  safeExec("git add -A", WORK_DIR);
  const stagedFiles = safeExec("git diff --cached --name-only", WORK_DIR);
  if (stagedFiles) {
    safeExec(`git commit -m "feat(${task.id}): auto-commit uncommitted changes"`, WORK_DIR);
  }
} else {
  log("Skipping safety-net commit — agent produced no work.");
}

3. Post-Agent Build Check

Workers run tsc --noEmit after agent completion to detect compile failures early:

let buildExitCode: number | null = null;
if (!isEmptyResponse && existsSync(`${WORK_DIR}/tsconfig.json`)) {
  try {
    execSync("npx tsc --noEmit", { cwd: WORK_DIR, encoding: "utf-8", timeout: 60_000 });
    buildExitCode = 0;
  } catch (buildErr: unknown) {
    buildExitCode = hasStatusCode(buildErr) ? buildErr.status : 1;
  }
}

const handoff: Handoff = {
  // ...
  concerns: buildExitCode !== null && buildExitCode !== 0
    ? [`Post-agent build check failed (tsc exit code ${buildExitCode})`]
    : [],
  buildExitCode,
};

This allows the planner to see build failures in worker handoffs before the reconciler sweep, enabling proactive fixes.

Anti-Patterns (from Prompt)

Implement first, understand later — Writing code without exploring the existing codebase
Sparse handoffs — “Done. Implemented auth.” tells the planner nothing
Heroic scope expansion — Fixing bugs outside your scope creates merge conflicts
Silent deviations — Using approach Y when task specified approach X, without explaining why

Best Practices

1. Explore Before Implementing

Use Pi’s exploration tools extensively:

# Find existing patterns
grep -r "export.*Error" src/

# Understand file structure
find src/ -name "*test*"

# Read related code
read src/auth/password.ts

2. Verify Incrementally

After each significant change:

npx tsc --noEmit          # Type check
npm test -- src/auth/     # Run relevant tests

Don’t accumulate unverified changes.

3. Write Rich Handoffs

Every handoff should include:

What patterns you followed
What concerns you noticed
What suggestions you have for follow-ups

Empty concerns/suggestions are a red flag.

4. Respect Scope Boundaries

If you discover broken code outside your scope:

Report it in handoff concerns
Do NOT fix it — that’s another worker’s responsibility

Scope violations cause merge conflicts.

Next Steps

Root Planner Agent — See how tasks are created
Subplanner Agent — Understand task decomposition
Reconciler Agent — Learn about build health monitoring
Pi Coding Agent — Explore the underlying agent framework

Overview

Getting Started

Core Concepts

Guides

Agent Development

Examples

​Core Workflow

​1. Plan (Before Writing Code)

​2. Execute

​3. Reflect (After Every Significant Change)

​4. Verify (Multi-Pass)

​5. Commit and Handoff

​Implementation

​Sandbox Execution

​Full Pi Tool Suite

​Task Prompt Construction

​Artifact Filtering

​Prompt Engineering

​1. Workflow: Plan → Execute → Verify

​2. Non-Negotiable Constraints

​3. Code Quality

​4. The Handoff

​5. Status Meanings

​Handoff Protocol

​Good Handoff Example

​Bad Handoff Example

​Configuration

​Sandbox Environment

​Essential .gitignore

​Bug Fixes and Safeguards

​1. Empty Response Detection

​2. Safety-Net Commit Guard

​3. Post-Agent Build Check

​Anti-Patterns (from Prompt)

​Best Practices

​1. Explore Before Implementing

​2. Verify Incrementally

​3. Write Rich Handoffs

​4. Respect Scope Boundaries

​Next Steps

Build docs developers (and LLMs) love