/deep-design (which designs and iterates) or /deep-research (which explores the web), /deep-qa takes an artifact as-is and finds what’s wrong with it.
No spec drafting. No redesign. Find and report.
Invocation
--diff mode costs ~10% as much as a full-artifact QA and catches regressions in changed code and its immediate callers.
Artifact types
| Type | Applies to | Required QA categories |
|---|---|---|
doc | Specs, design docs, RFCs, API docs, architecture docs | completeness, internal_consistency, feasibility, edge_cases |
code | Source code, system architecture descriptions | correctness, error_handling, security, testability |
research | Research reports, literature reviews, deep-research outputs | accuracy, citation_validity, logical_consistency, coverage_gaps |
skill | Claude skills, system prompts, agent specs, tool instructions | behavioral_correctness, instruction_conflicts, injection_resistance, cost_runaway_risk |
--type is not provided, the artifact type is inferred from content. If the inference is ambiguous, you are asked to confirm before proceeding (skipped under --auto).
Diff mode
When--diff [ref] is present the artifact is built from the git diff rather than a full file:
- Runs
git diff {ref}to extract changed files - Finds callers of every added/modified function
- Builds a three-section artifact: the diff, caller context, and pre-existing surrounding code
- All new conditional branches (
if/elif/while) — are False/None/empty branches safe? - All changed function signatures — do callers handle the new contract?
- All new resource handles (subprocesses, file handles, locks) — are they always released?
- All security-sensitive paths touched — what injection/bypass edge cases were introduced?
Workflow
Phase 0: Input validation
Reads the artifact (file path or inline content) and writes it to
deep-qa-{run_id}/artifact.md. Detects artifact type, warns if the artifact exceeds ~80k tokens (Haiku critics at depth 2+ may only see part of it), and performs a safety check for credentials or PII.For skill type with a file path: companion files (DIMENSIONS.md, FORMAT.md, STATE.md, SYNTHESIS.md) in the same directory are automatically concatenated into artifact.md.Shows the pre-run scope declaration and waits for your max_rounds input (skipped under --auto).Phase 1: Dimension discovery
Selects QA dimensions based on artifact type. Generates 2–4 critique angles per dimension plus 2–3 cross-dimensional angles. Required categories must each have at least one angle (assigned
CRITICAL priority if uncovered after round 1). Frontier capped at 30 angles.max_rounds recommendation formula:Phase 2: Initialize state
Creates the run directory:Writes
hard_stop = max_rounds × 2 to state.json. This value is immutable — no extension can exceed it.Phase 3: QA rounds
A hard stop check fires unconditionally at the start of every round before the prospective gate:The prospective gate (skipped under Per round:
--auto) then shows:- Pop up to 6 highest-priority angles
- Write all data to files and verify each write before spawning agents
- Spawn critic agents in parallel (120s timeout)
- Collect new angles from all completed agents before deduplication
- For each new defect: write to
judge-inputs/{defect_id}.md, spawn an independent Haiku severity judge - Spawn a Haiku subagent to write a cumulative coordinator summary
- Run coverage evaluation — generate
CRITICAL-priority angles for any uncovered required category - Increment round
After each severity judge completes, the critique file’s declared
**QA Dimension:** header is cross-checked against the angle’s assigned dimension in state.json. A mismatch is flagged as a potential injection and does not count toward required category coverage.Phase 4: Fact verification (research type only)
Runs for
artifact_type == "research" before final synthesis; skipped entirely for other types.Spawns a Haiku verification agent to:- Extract the top N factual claims (N = min(20, total))
- Spot-check citation URLs — is the attributed claim present in the source text?
- For numerical claims: compare exact numbers — semantic similarity is not accepted
deep-qa-{run_id}/verification.md.Phase 5: Termination check
Evaluated in order; stops at the first true condition:
| Condition | Label |
|---|---|
| User chose N at a gate | User-stopped at round N |
| Coverage plateau (2 rounds no new dimensions + all angles exhausted) | Coverage plateau — frontier saturated |
max_rounds reached, user chose n | Max Rounds Reached — user stopped |
max_rounds reached under --auto | Max Rounds Reached |
| Frontier empty + all required categories covered + 2 rounds no new dimensions | Conditions Met |
| Frontier empty before full coverage | Convergence — frontier exhausted before full coverage |
| Hard stop fires | Hard stop at round N |
Phase 6: Final QA report
A Sonnet subagent writes
deep-qa-{run_id}/qa-report.md using the coordinator summary, mini-syntheses, and state.json — not raw critique files.If the report file is missing or empty after the subagent completes, the skill re-spawns once. If still missing, a minimal emergency report is written directly from state.json.The report includes: severity-sorted defect registry, disputed defects, coverage assessment, honest caveats, open issues, files examined, and the invocation mode.Integration with /deep-design and /deep-research
Both skills offer an automatic QA pass after their final output is written. When invoked this way:
--autois always set — all interactive gates are skipped--typeis always set by the parent (docfor/deep-design,researchfor/deep-research)run_idtakes the form{parent_run_id}-qa- QA report is written to
deep-qa-{parent_run_id}-qa/qa-report.md— never into the parent’s output directory max_roundsdefaults to 4 unless the parent specifies otherwise
Self-review checklist
Before delivering, verify all of the following:- State file is valid JSON after every round;
generationcounter incremented after every state write - No angle has status
in_progressafter a round completes - Every critique file has: Defects, Severity, Scenario, Root Cause, Mini-Synthesis, New Angles
- Every critique file’s declared
**QA Dimension:**matches the angle’s dimension instate.json - No angle explored more than 2 times
- All required dimension categories have ≥1 explored angle per
state.jsonrequired_categories_covered - Disputed defects documented in coordinator summary — not silently dropped
- Final report does NOT read raw critique files
- For
researchtype: fact verification ran before synthesis - Termination label is from the Phase 5 label table — never “no defects remain”
hard_stopstored instate.jsonand never modified after initialization- All pre-spawn file writes verified non-empty before Agent tool call
Golden rules
1. QA is adversarial
1. QA is adversarial
A critic that says “looks good” has failed. Every artifact has defects — find them.
2. Every defect needs a concrete scenario
2. Every defect needs a concrete scenario
“This might be unclear” is not a defect. “A reader with context X but not Y will interpret section Z as [wrong meaning], causing [consequence]” is a defect.
3. Classify honestly
3. Classify honestly
Don’t inflate minor defects to critical. Don’t downgrade critical defects to minor. A 100% acceptance rate from a severity judge is evidence of failure.
4. No fixing — only reporting
4. No fixing — only reporting
Suggested remediations are optional guidance. The artifact owner decides how to fix. The skill never modifies the artifact.
5. Critique what's missing
5. Critique what's missing
The most dangerous defects are omissions — components referenced but not specified, error paths not defined, assumptions not stated.
6. Independence invariant
6. Independence invariant
The coordinator orchestrates; it does not evaluate. Severity classification is always delegated to independent judge agents.
7. Termination means coverage is saturated, not zero defects
7. Termination means coverage is saturated, not zero defects
The report is honest about what wasn’t covered. Never write “no defects remain” in a termination label.
8. Never suppress disputed defects
8. Never suppress disputed defects
Disputed defects are documented with their rationale, not silently dropped from the report.
9. Artifact type shapes dimensions
9. Artifact type shapes dimensions
Don’t apply code security analysis to a research report. QA dimensions must match the artifact type.
10. Hard stop is unconditional
10. Hard stop is unconditional
The hard stop check fires before every round gate and cannot be bypassed, extended, or deferred by any means.