What is Adversarial Review?
A review technique where the reviewer must find issues. No “looks good” allowed. The reviewer adopts a cynical, skeptical stance - assume problems exist and find them. This isn’t about being negative for its own sake. It’s about forcing genuine analysis instead of a cursory glance that rubber-stamps whatever was submitted.The core rule: You must find issues. Zero findings triggers a halt - re-analyze or explain why nothing is wrong.
Why Standard Reviews Fail
Normal reviews suffer from predictable cognitive biases:Confirmation Bias
You skim the work, nothing immediately jumps out, so you approve it. Your brain is looking for reasons to confirm “this is fine” rather than actively seeking problems.Surface-Level Analysis
Without a forcing function, reviewers take the path of least resistance - checking syntax, formatting, obvious errors. Deeper issues like missing edge cases, security vulnerabilities, or architectural problems go unnoticed.Authority Bias
If the author is senior or respected, reviewers unconsciously defer to their judgment. “They probably thought of this” becomes a reason to skip critical thinking.Time Pressure
Reviews compete with other work. The easiest way to clear your queue is quick approval. Thorough analysis takes time most people don’t allocate.How Adversarial Review Works
The “find problems” mandate breaks these patterns:Forces Thoroughness
You can’t approve until you’ve looked hard enough to find issues. This naturally extends review time and deepens analysis.Shifts the Question
Instead of “Is anything obviously broken?” you ask “What’s wrong with this?” and “What’s missing?” These questions surface different insights.Catches Absence
Normal reviews focus on what’s present. Adversarial review asks “What should be here but isn’t?” - error handling, edge cases, validation, documentation, tests.Improves Signal Quality
Findings are specific and actionable, not vague concerns. “This might have issues” becomes “Line 47: No rate limiting on failed login attempts.”Information Asymmetry Advantage
Best results come from reviewing the artifact without access to original reasoning. You evaluate what’s actually there, not what the author intended.Where It’s Used
Adversarial review appears throughout BMad workflows:- Code review - Find bugs, security issues, performance problems
- Implementation readiness checks - Validate specs before building
- Spec validation - Find gaps, contradictions, ambiguities
- Architecture review - Surface conflicts, missing decisions
- Test coverage - Identify untested scenarios
- Documentation review - Find unclear explanations, missing info
Example: Before and After
Standard Review
“The authentication implementation looks reasonable. Token-based auth with session management. Approved.”Problems: Misses security vulnerabilities, doesn’t check edge cases, provides no specific feedback.
Adversarial Review
Issues Found:Result: The first review would have shipped with multiple security vulnerabilities. The second caught eight issues, four of them high severity.
- HIGH -
login.ts:47- No rate limiting on failed login attempts (enables brute force attacks)- HIGH -
auth.ts:123- Session token stored in localStorage (vulnerable to XSS attacks, should use httpOnly cookies)- HIGH -
password.ts:89- Password validation happens client-side only (can be bypassed)- MEDIUM -
login.ts:52- No audit logging for failed login attempts (can’t detect attack patterns)- MEDIUM -
session.ts:34- Session timeout not implemented (sessions never expire)- MEDIUM - Missing: No account lockout after repeated failures
- LOW -
auth.ts:145- Magic number3600should be named constantSESSION_TIMEOUT_SECONDS- LOW -
login.ts:23- Error messages reveal whether username exists (enables user enumeration)
Human Filtering Required
Because the AI is instructed to find problems, it will find problems - even when they don’t actually exist.Types of False Positives
Misunderstood Context“HIGH - No error handling for database connection”Actual: Error handling exists in the connection pool layer, reviewer didn’t see it. Nitpicking
“MEDIUM - Variable nameActual: Minor style preference, not a real issue. Hallucinated Problemsusrshould be spelled out asuser”
“HIGH - Function doesn’t validate email format”Actual: Email validation happens at the schema level, function correctly assumes input is pre-validated. Over-Engineering
“MEDIUM - Should add caching layer for performance”Actual: Premature optimization, current performance is fine.
Your Role
You decide what’s real. Review each finding:- Dismiss - False positive or nitpick
- Fix - Real issue that matters
- Note - Valid point but not worth addressing now
- Investigate - Uncertain, need to verify
Iteration and Diminishing Returns
After addressing findings, consider running adversarial review again:First Pass
Catches obvious issues, missing pieces, common problems. Highest ROI.Second Pass
Catches subtler issues that the first review missed or that were introduced by fixes. Still valuable.Third Pass
Might catch a few more things, but increasingly dominated by false positives and nitpicks.Fourth+ Pass
Diminishing returns. You’re mostly generating noise at this point.Best Practices
For Maximum Effectiveness
Use information asymmetry - Don’t give the reviewer access to original reasoning, design docs, or discussions. Evaluate only what’s in the artifact. Review with fresh eyes - Wait a day before adversarial review of your own work. Mental distance helps you spot issues. Focus on high-impact areas - Apply adversarial review to security, data handling, public APIs, critical business logic. Less critical for internal utilities. Combine with other techniques - Use adversarial review, then apply advanced elicitation (pre-mortem analysis) to findings. Document patterns - Track common issues to improve future work and update templates/checklists.Common Mistakes to Avoid
Accepting everything - You’ll fix non-issues and waste time. Filter ruthlessly. Dismissing everything - Defensiveness prevents learning. Consider each finding honestly. Infinite iteration - Know when to stop. Diminishing returns kick in fast. Wrong severity levels - Reviewer might mark nitpicks as HIGH. Recalibrate based on actual impact. No action tracking - Document real issues and track fixes. Don’t let valid findings get lost.Integration with Workflows
In implement
After implementation, before marking complete:
In prd-co-write
Validate requirements before finalizing:
In plan-build
Validate architecture decisions:
Measuring Success
How do you know adversarial review is working? Good indicators:- Finding 3-8 real issues per review (right range, not too few or too many)
- Mix of severity levels (not all nitpicks, not all critical)
- Issues you genuinely didn’t notice before
- Improved quality in subsequent work (learning from patterns)
- Reduced production bugs (catching issues earlier)
- Consistently finding 0-1 issues (not being thorough enough)
- Finding 20+ issues (reviewer is nitpicking or hallucinating)
- All findings are LOW severity (missing real problems)
- Same issues appearing repeatedly (not learning from feedback)
Advanced Techniques
Role-Based Review
Review from different perspectives:- Security reviewer - Find vulnerabilities and attack vectors
- Performance reviewer - Identify bottlenecks and inefficiencies
- Maintainability reviewer - Spot complexity and technical debt
- User experience reviewer - Find usability and accessibility issues
Comparative Review
Review against alternatives:- Best practices - Does this follow industry standards?
- Similar implementations - How does this compare to existing code?
- Competitor solutions - What are we missing that others have?
Constraint-Based Review
Review assuming specific constraints:- What if traffic 10x overnight?
- What if this needs to support 100 languages?
- What if the database is unavailable?
- What if malicious users attack this endpoint?
Real-World Example
Here’s an adversarial review of an API endpoint implementation:- HIGH - No input validation on email, password, or name
- HIGH - Password stored in plaintext (should be hashed)
- HIGH - No authentication check (anyone can call this)
- HIGH - No duplicate email check (allows multiple accounts)
- MEDIUM - No error handling for database failures
- MEDIUM - Returns 200 even on failure
- MEDIUM - No audit logging of user creation
- LOW - Response includes only userId (might want user object)
