Planning Philosophy
The core principle: plan only what you can confidently specify. Discover the rest.The planner doesn’t enumerate all tasks for an entire project upfront. It maintains awareness of the full goal set (from SPEC.md and FEATURES.json) but only emits tasks for work it can scope precisely given current knowledge.
Sprint-Based Planning
Each planning iteration is a sprint focused on a specific set of work:Planning Phases
Projects naturally progress through phases, each with different sprint sizes:| Phase | Focus | Sprint Size |
|---|---|---|
| Discovery | Understand codebase, read specs, identify architecture | 3-8 foundational tasks |
| Foundation | Core infrastructure, shared utilities, database setup | 5-15 tasks |
| Core Build-out | Primary features, main application logic | 10-30 tasks |
| Integration & Hardening | Wiring systems together, edge cases, bug fixes | 5-20 tasks |
What Determines Sprint Size
Sprint sizing is driven by confidence, not capacity: Confidence: Can you write a complete, self-contained description for each task? If you’re guessing at file paths, patterns, or interfaces — you’re not ready to emit that task. Independence: All tasks at the same priority must be fully parallel. If task B needs task A’s output, A must have higher priority or ship in an earlier sprint. Stability: If the last sprint surfaced architectural problems, pause feature work. Fix foundations before fanning out. Worker Feedback: Handoffs are bug reports from your team. If 3 workers reported the same missing utility, that’s your next task — not more features.Continuous Conversation Model
The planner operates as a persistent, continuous conversation — not stateless batch calls:- First message: Full request and initial repo state
- Follow-up messages: Only new handoffs since last response + fresh repo state snapshot
- Conversation history preserved: The planner has memory across iterations
- Scratchpad survives: The planner’s working memory persists even when context is compacted
Scratchpad: Working Memory
Every planning response includes a scratchpad — the planner’s working memory that gets rewritten completely each iteration:- Goals & Specs: Full goal set from SPEC.md/FEATURES.json with coverage status
- Current State: What’s built, broken, in progress, and key architectural decisions
- Sprint Reasoning: Why this set of tasks, what’s being deferred, and likely next focus
- Worker Intelligence: Patterns from handoffs, unresolved concerns, recurring issues
Exploration Before Planning
Before each sprint, the planner can explore the codebase using read-only tools:- read: Read file contents by path
- grep: Search file contents with regex
- find: Find files by glob pattern
- ls: List directory contents
- bash: Execute read-only git commands (
git log,git diff,git show, etc.)
The planner uses these tools to verify assumptions before emitting tasks. For example, checking if a dependency exists before planning features that use it.
Processing Worker Handoffs
Handoffs provide critical feedback that shapes future sprints:Concern Triage
When handoffs include concerns, the planner classifies each one:| Classification | Action | Example |
|---|---|---|
| Blocking | Create fix task this sprint | ”Type mismatch breaks callers in 3 files” |
| Architectural | Update scratchpad, adjust future tasks | ”Auth doesn’t handle token refresh” |
| Informational | Note in scratchpad, no immediate action | ”Found dead code in utils.ts” |
The scratchpad MUST track unresolved concerns across iterations. A concern raised in sprint 3 that isn’t addressed by sprint 5 is a planning failure.
Subplanner Decomposition
When a task’s scope is broad (many files, multiple concerns), the system may assign a subplanner to decompose it further:Binding Constraints: SPEC.md and FEATURES.json
These documents are constraints on planner output — not background context: SPEC.md defines:- Allowed dependencies
- File structure
- Technical parameters
- Acceptance tests
- Non-negotiables
- What to build
- Feature dependencies
- Completion status
Priority Ordering
Priority expresses ordering within a sprint:- 1-2: Infrastructure, types, interfaces (foundations)
- 3-5: Core feature implementation
- 6-7: Secondary features, integration
- 8-10: Polish, documentation, nice-to-have
Definition of Done
Every task includes a clear definition of done in itsacceptance field:
Bad: “Function works correctly. Tests pass.”
Good: “createUser() rejects duplicate emails with DuplicateEmailError. Tests cover: valid creation, duplicate email, missing required fields, invalid email format. tsc —noEmit exits 0.”
Acceptance criteria must specify:
- Verification: Build/type-check commands and expected results
- Integration: What call sites should work, API contracts
- Quality bar: Patterns to follow, edge cases to handle