Skip to main content
The worker is the only agent that writes code. It receives a task, implements it on an isolated branch in a sandbox environment, and returns a detailed handoff report.

Core Workflow

Workers follow a strict Plan → Execute → Verify → Commit cycle:

1. Plan (Before Writing Code)

  • Read the task description and acceptance criteria completely
  • Explore relevant files — read the code in scope, search for patterns
  • Form a concrete approach: what to change, what to create, what to call

2. Execute

  • Implement the solution
  • After each significant change, immediately verify:
    • Compile: npx tsc --noEmit
    • Run relevant tests if they exist
  • Fix before continuing — do not accumulate unverified changes

3. Reflect (After Every Significant Change)

Before moving to full verification, pause and check:
  • Am I still solving the task described in the acceptance criteria?
  • Have I drifted into fixing things outside my scope?
  • Is my approach consistent with the patterns I found during exploration?
Scope creep is the #1 worker failure mode.

4. Verify (Multi-Pass)

After implementation is complete, run the full verification cycle:
  • Compile the project
  • Run all tests in scope
  • Check for edge cases the task description may not have mentioned
  • Look for similar patterns elsewhere that your change should also address
If anything fails, fix it and re-verify. Up to 3 full fix cycles. After 3 failures, report as “blocked.”

5. Commit and Handoff

  • Commit all work to your branch
  • Write a thorough handoff (see Handoff Protocol)

Implementation

Location: packages/sandbox/src/worker-runner.ts

Sandbox Execution

Workers run in isolated Modal sandboxes with full filesystem access:
export async function runWorker(): Promise<void> {
  const startTime = Date.now();

  // 1. Read task payload from /workspace/task.json
  const raw = readFileSync(TASK_PATH, "utf-8");
  const payload: TaskPayload = JSON.parse(raw);
  const { task, systemPrompt, llmConfig } = payload;

  // 2. Enable distributed tracing
  enableTracing("/workspace");
  let workerSpan: Span | undefined;
  if (payload.trace) {
    const tracer = Tracer.fromPropagated(payload.trace);
    workerSpan = tracer.startSpan("sandbox.worker", {
      taskId: task.id,
      agentId: `sandbox-${task.id}`,
    });
  }

  // 3. Write worker instructions as AGENTS.md in parent dir
  //    (Pi auto-discovers AGENTS.md by walking up from cwd)
  if (systemPrompt) {
    writeFileSync(WORKER_AGENTS_MD_PATH, systemPrompt, "utf-8");
  }

  // 4. Register LLM model with Pi's ModelRegistry
  const authStorage = Reflect.construct(AuthStorage, []) as AuthStorage;
  const modelRegistry = new ModelRegistry(authStorage);
  modelRegistry.registerProvider("glm5", {
    baseUrl: llmConfig.endpoint,
    apiKey: llmConfig.apiKey || "no-key-needed",
    api: "openai-completions",
    models: [{
      id: llmConfig.model,
      name: llmConfig.model,
      reasoning: false,
      input: ["text"],
      cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
      contextWindow: 131072,
      maxTokens: llmConfig.maxTokens,
    }],
  });

  const model = modelRegistry.find("glm5", llmConfig.model);
  if (!model) {
    throw new Error(`Model "${llmConfig.model}" not found in registry`);
  }

  const startSha = safeExec("git rev-parse HEAD", WORK_DIR);

  // 5. Create Pi agent session with FULL tool suite
  const { session } = await createAgentSession({
    cwd: WORK_DIR,
    model,
    tools: fullPiTools,  // [read, write, edit, bash, grep, find, ls]
    authStorage,
    modelRegistry,
    sessionManager: SessionManager.inMemory(),
    settingsManager: SettingsManager.inMemory(),
    thinkingLevel: "off",
  });

  let toolCallCount = 0;
  let lastAssistantMessage = "";

  // 6. Subscribe to Pi events to track tool calls
  session.subscribe((event: unknown) => {
    if (event.type === "tool_execution_start") {
      toolCallCount++;
    }
    if (event.type === "message_end" && event.message.role === "assistant") {
      // Extract final assistant text
      lastAssistantMessage = extractTextFromContent(event.message.content);
    }
  });

  // 7. Prompt agent with task
  const prompt = buildTaskPrompt(task);
  await session.prompt(prompt);

  const stats = session.getSessionStats();
  const tokensUsed = stats.tokens.total;
  session.dispose();

  // 8. Detect empty LLM responses (bug fix)
  const isEmptyResponse = tokensUsed === 0 && toolCallCount === 0;
  if (isEmptyResponse) {
    log("WARNING: LLM returned empty response (0 tokens, 0 tool calls). Marking as failed.");
  }

  // 9. Ensure .gitignore exists (scaffold safety)
  if (!existsSync(`${WORK_DIR}/.gitignore`)) {
    writeFileSync(`${WORK_DIR}/.gitignore`, GITIGNORE_ESSENTIALS, "utf-8");
  }

  // 10. Safety-net commit (only if agent did work)
  if (!isEmptyResponse) {
    safeExec("git add -A", WORK_DIR);
    const stagedFiles = safeExec("git diff --cached --name-only", WORK_DIR);
    if (stagedFiles) {
      safeExec(`git commit -m "feat(${task.id}): auto-commit uncommitted changes"`, WORK_DIR);
    }
  }

  // 11. Post-agent build check
  let buildExitCode: number | null = null;
  if (!isEmptyResponse && existsSync(`${WORK_DIR}/tsconfig.json`)) {
    try {
      execSync("npx tsc --noEmit", { cwd: WORK_DIR, encoding: "utf-8", timeout: 60_000 });
      buildExitCode = 0;
    } catch (buildErr: unknown) {
      buildExitCode = hasStatusCode(buildErr) ? buildErr.status : 1;
    }
  }

  // 12. Extract diff stats (exclude artifacts)
  const diff = safeExec(`git diff ${startSha} --no-color -- . ':!node_modules'`, WORK_DIR);
  const numstat = safeExec(`git diff ${startSha} --numstat`, WORK_DIR);
  const filesCreatedRaw = safeExec(`git diff ${startSha} --diff-filter=A --name-only`, WORK_DIR);
  const filesChangedRaw = safeExec(`git diff ${startSha} --name-only`, WORK_DIR);

  const filesChanged = filesChangedRaw.split("\n").filter(Boolean).filter((f) => !isArtifact(f));
  const filesCreated = filesCreatedRaw.split("\n").filter(Boolean).filter((f) => !isArtifact(f));

  // Parse numstat for line counts
  let linesAdded = 0;
  let linesRemoved = 0;
  if (numstat) {
    for (const line of numstat.split("\n")) {
      const [addedRaw, removedRaw, filePath] = line.split("\t");
      if (filePath && !isArtifact(filePath)) {
        linesAdded += parseInt(addedRaw, 10) || 0;
        linesRemoved += parseInt(removedRaw, 10) || 0;
      }
    }
  }

  // 13. Build handoff
  const handoff: Handoff = {
    taskId: task.id,
    status: isEmptyResponse ? "failed" : "complete",
    summary: isEmptyResponse
      ? "Task failed: LLM returned empty response (0 tokens, 0 tool calls). Possible API/endpoint failure."
      : lastAssistantMessage || "Task completed (no final message captured).",
    diff,
    filesChanged,
    concerns: isEmptyResponse
      ? ["Empty LLM response — possible API failure or model endpoint issue"]
      : buildExitCode !== null && buildExitCode !== 0
        ? [`Post-agent build check failed (tsc exit code ${buildExitCode})`]
        : [],
    suggestions: isEmptyResponse ? ["Check LLM endpoint connectivity"] : [],
    buildExitCode,
    metrics: {
      linesAdded,
      linesRemoved,
      filesCreated: filesCreated.length,
      filesModified: Math.max(0, filesChanged.length - filesCreated.length),
      tokensUsed,
      toolCallCount,
      durationMs: Date.now() - startTime,
    },
  };

  writeResult(handoff);
}

Full Pi Tool Suite

Workers get all 7 Pi tools, not just the limited 4-tool codingTools set:
import {
  codingTools,    // [read, bash, edit, write]
  grepTool,       // Ripgrep-powered content search
  findTool,       // Glob-based file search
  lsTool,         // Directory listing
} from "@mariozechner/pi-coding-agent";

const fullPiTools = [...codingTools, grepTool, findTool, lsTool];
Why this matters:
  • grep enables fast content search (e.g., “find all uses of this function”)
  • find enables file discovery (e.g., “find all test files”)
  • ls enables directory exploration before reading files
These tools dramatically improve agent autonomy compared to the minimal set.

Task Prompt Construction

export function buildTaskPrompt(task: Task): string {
  const parts: string[] = [
    `## Task: ${task.id}`,
    `**Description:** ${task.description}`,
    `**Scope (files to focus on):** ${task.scope.join(", ")}`,
    `**Acceptance criteria:** ${task.acceptance}`,
    `**Branch:** ${task.branch}`,
    "",
    "Complete this task. Commit your changes when done. Stay focused on the scoped files.",
  ];

  return parts.join("\n");
}

Artifact Filtering

Build artifacts are excluded from diff stats:
const ARTIFACT_PATTERNS = [
  /^node_modules\//,
  /^\.next\//,
  /^dist\//,
  /^build\//,
  /^out\//,
  /^\.turbo\//,
  /^\.tsbuildinfo$/,
  /^package-lock\.json$/,
  /^pnpm-lock\.yaml$/,
  /^yarn\.lock$/,
  /^\.pnpm-store\//,
];

function isArtifact(filePath: string): boolean {
  return ARTIFACT_PATTERNS.some((p) => p.test(filePath));
}

Prompt Engineering

Location: prompts/worker.md

1. Workflow: Plan → Execute → Verify

### 1. Plan (before writing any code)
- Read the task description and acceptance criteria completely.
- Explore relevant files — read the code in scope, search for patterns, understand how it connects.
- Form a concrete approach: what to change, what to create, what to call.

### 2. Execute
- Implement the solution.
- After each significant change (new function, modified interface, added file), immediately verify:
  - Compile: `npx tsc --noEmit` (or the project's build command)
  - Run relevant tests if they exist
- If verification fails, fix before continuing. Do not accumulate unverified changes.

### 2.5. Reflect (after every significant change)
Before moving to full verification, pause and check:
- Am I still solving the task described in the acceptance criteria?
- Have I drifted into fixing things outside my scope?
- Is my approach consistent with the patterns I found during exploration?

If you've drifted, stop and course-correct before writing more code. **Scope creep is the #1 worker failure mode.**

### 3. Verify (multi-pass)
- After implementation is complete, run the full verification cycle
- Up to 3 full fix cycles.

2. Non-Negotiable Constraints

- **NEVER leave TODOs, placeholder code, or partial implementations.** Every function must be complete and working.
- **NEVER modify files outside your task scope.** If scoped to `src/auth/token.ts` and `src/auth/middleware.ts`, touch nothing else.
- **NEVER delete or disable tests.** If a test fails, fix your code — not the test.
- **NEVER use `any` types, `@ts-ignore`, or `@ts-expect-error`.** Fix type errors properly.
- **NEVER leave empty catch blocks.** Handle errors meaningfully or let them propagate.
- **NEVER claim completion without running verification.** Compile + test = minimum bar.
- **NEVER continue past a failing compilation without fixing it.** Errors compound.
- **NEVER import dependencies not already in package.json** without noting it in handoff concerns.
- **ALWAYS commit before handoff.** All work must be saved to your branch.
- **3 failed fix cycles = stop.** Report as "blocked" with what you tried and what went wrong.

3. Code Quality

**The acceptance criteria are your contract.** They define exactly what "done" means — verification 
steps, test scenarios, integration points, edge cases, and patterns to follow. Meet every point.

Your code should be indistinguishable from what a staff engineer on the team would write. Match 
existing patterns in the repository — style, conventions, structure, error handling, naming. 
Blend in, don't impose.

4. The Handoff

Your handoff is the only way information flows back to the planner. A rich, detailed handoff 
directly improves future planning. Sparse handoffs waste system capacity.

ALWAYS report:
- What you actually did (not just what was asked)
- Deviations from the task description and why
- Concerns: code smells, potential bugs, fragile patterns, uncovered edge cases
- Findings: unexpected things discovered about the codebase
- Cross-agent issues: if other workers' changes appear broken or conflicting
- Feedback: if the task description was unclear or missing information

**Handoffs with empty concerns and suggestions are almost always wrong.** You should ALWAYS 
notice something worth mentioning, even if minor.

5. Status Meanings

- **complete** — every point in the acceptance criteria is met, code compiles, all specified tests exist and pass, edge cases handled
- **partial** — meaningful progress made but not fully done. Describe what remains.
- **blocked** — could not proceed after 3 fix cycles. Describe what you tried.
- **failed** — something went fundamentally wrong. Describe the failure.

Handoff Protocol

The handoff is the only communication channel from worker back to planner:
{
  "status": "complete | partial | blocked | failed",
  "summary": "What you did and how. 2-4 sentences.",
  "filesChanged": ["src/auth/token.ts", "src/auth/middleware.ts"],
  "concerns": ["Risks, unexpected findings, things that worry you"],
  "suggestions": ["Ideas for follow-up work"]
}

Good Handoff Example

Task: “Implement JWT token generation in src/auth/token.ts”
{
  "status": "complete",
  "summary": "Implemented generateAccessToken and generateRefreshToken in src/auth/token.ts. Both functions follow the config-loading pattern from password.ts (reading from process.env via src/config.ts). Added TOKEN_EXPIRY constants to the top of the file. Verified with npx tsc --noEmit — no errors.",
  "filesChanged": ["src/auth/token.ts"],
  "concerns": [
    "JWT_SECRET is read from process.env but never validated at startup. Other env vars (DB_HOST, DB_PORT) are validated in src/config.ts — JWT_SECRET should be too.",
    "The existing password.ts uses bcrypt synchronously (compareSync). Token generation uses async jwt.sign. Callers will need to handle the Promise."
  ],
  "suggestions": [
    "Add JWT_SECRET to the env validation in src/config.ts.",
    "A follow-up task should implement token refresh rotation — the current generateRefreshToken issues new tokens but there's no invalidation of old ones."
  ]
}
What makes this good:
  • States what patterns were followed (“config-loading pattern from password.ts”)
  • Flags missing validation (JWT_SECRET not validated at startup)
  • Identifies async/sync mismatch concern
  • Suggests concrete follow-up tasks

Bad Handoff Example

{
  "status": "complete",
  "summary": "Added token generation functions.",
  "filesChanged": ["src/auth/token.ts"],
  "concerns": [],
  "suggestions": []
}
What’s wrong: No mention of patterns, no concerns, no actionable feedback. The planner learns nothing.

Configuration

Sandbox Environment

const TASK_PATH = "/workspace/task.json";    // Task payload location
const RESULT_PATH = "/workspace/result.json";  // Handoff output location
const WORK_DIR = "/workspace/repo";            // Git repo working directory
const WORKER_AGENTS_MD_PATH = "/workspace/AGENTS.md";  // System prompt

Essential .gitignore

const GITIGNORE_ESSENTIALS = [
  "node_modules/",
  ".next/",
  "dist/",
  "build/",
  "out/",
  ".turbo/",
  "*.tsbuildinfo",
  ".pnpm-store/",
  "package-lock.json",
  "pnpm-lock.yaml",
  "yarn.lock",
].join("\n");
If .gitignore doesn’t exist, it’s created automatically with these essentials.

Bug Fixes and Safeguards

1. Empty Response Detection

Previous versions treated empty LLM responses (0 tokens, 0 tool calls) as “complete,” producing false-positive scaffold diffs:
const isEmptyResponse = tokensUsed === 0 && toolCallCount === 0;
if (isEmptyResponse) {
  log("WARNING: LLM returned empty response (0 tokens, 0 tool calls). Marking task as failed.");
}

const handoff: Handoff = {
  taskId: task.id,
  status: isEmptyResponse ? "failed" : "complete",
  summary: isEmptyResponse
    ? "Task failed: LLM returned empty response. Possible API/endpoint failure."
    : lastAssistantMessage,
  concerns: isEmptyResponse
    ? ["Empty LLM response — possible API failure"]
    : [],
};

2. Safety-Net Commit Guard

Previous versions always safety-net committed, even when the agent did nothing, producing scaffold-only commits:
// Only safety-net commit if agent actually did work
if (!isEmptyResponse) {
  safeExec("git add -A", WORK_DIR);
  const stagedFiles = safeExec("git diff --cached --name-only", WORK_DIR);
  if (stagedFiles) {
    safeExec(`git commit -m "feat(${task.id}): auto-commit uncommitted changes"`, WORK_DIR);
  }
} else {
  log("Skipping safety-net commit — agent produced no work.");
}

3. Post-Agent Build Check

Workers run tsc --noEmit after agent completion to detect compile failures early:
let buildExitCode: number | null = null;
if (!isEmptyResponse && existsSync(`${WORK_DIR}/tsconfig.json`)) {
  try {
    execSync("npx tsc --noEmit", { cwd: WORK_DIR, encoding: "utf-8", timeout: 60_000 });
    buildExitCode = 0;
  } catch (buildErr: unknown) {
    buildExitCode = hasStatusCode(buildErr) ? buildErr.status : 1;
  }
}

const handoff: Handoff = {
  // ...
  concerns: buildExitCode !== null && buildExitCode !== 0
    ? [`Post-agent build check failed (tsc exit code ${buildExitCode})`]
    : [],
  buildExitCode,
};
This allows the planner to see build failures in worker handoffs before the reconciler sweep, enabling proactive fixes.

Anti-Patterns (from Prompt)

  1. Implement first, understand later — Writing code without exploring the existing codebase
  2. Sparse handoffs — “Done. Implemented auth.” tells the planner nothing
  3. Heroic scope expansion — Fixing bugs outside your scope creates merge conflicts
  4. Silent deviations — Using approach Y when task specified approach X, without explaining why

Best Practices

1. Explore Before Implementing

Use Pi’s exploration tools extensively:
# Find existing patterns
grep -r "export.*Error" src/

# Understand file structure
find src/ -name "*test*"

# Read related code
read src/auth/password.ts

2. Verify Incrementally

After each significant change:
npx tsc --noEmit          # Type check
npm test -- src/auth/     # Run relevant tests
Don’t accumulate unverified changes.

3. Write Rich Handoffs

Every handoff should include:
  • What patterns you followed
  • What concerns you noticed
  • What suggestions you have for follow-ups
Empty concerns/suggestions are a red flag.

4. Respect Scope Boundaries

If you discover broken code outside your scope:
  • Report it in handoff concerns
  • Do NOT fix it — that’s another worker’s responsibility
Scope violations cause merge conflicts.

Next Steps

Build docs developers (and LLMs) love