Overview
MAKER (Massively decomposed Agentic processes with K-voting Error Reduction) achieves high reliability by sampling a worker agent multiple times and using “first-to-ahead-by-k” voting to select the consensus response. This pattern trades compute for accuracy, enabling cheap models to achieve reliability suitable for million-step tasks.Based on “Solving a Million-Step LLM Task with Zero Errors” (arXiv:2511.09030)
Credit: Lucid Programmer (PR author)
Credit: Lucid Programmer (PR author)
Key Features
- Statistical Consensus: Multiple samples voted to find agreement
- First-to-ahead-by-k: Winner needs k-vote margin over alternatives
- Red-Flagging: Discard suspicious responses before voting
- Provable Bounds: Mathematical error guarantees based on per-step success rate
- Cost-Effective: Cheap models with voting can replace expensive models
When to Use MAKER
Ideal Use Cases
Long chains of simple steps where rare errors compound:
- ETL Pipelines: 1000s of row transformations - one bad parse = corrupted data
- Code Migration: 1000s of file changes - one syntax error = build fails
- Document Processing: 1000s of pages - one missed field = compliance failure
- Data Validation: Millions of records - one wrong validation = bad data in prod
- Automated Testing: 1000s of assertions - one false positive = wasted debugging
- Cost Optimization: Cheap model + voting replaces expensive model
The Math Behind MAKER
Basic Usage
Configuration Parameters
Name of the MAKER workflow
Name of the worker agent to sample from
Voting margin required (first-to-ahead-by-k). Higher k = more reliable but more samples needed. Paper recommends k ≥ 3 for high reliability.
Maximum samples before falling back to plurality vote
How to compare responses for voting:
exact: Character-for-character matchnormalized: Ignore case/whitespacestructured: Parse and compare JSON
Custom normalization function (overrides match_strategy)
Discard responses longer than this (characters). Per the paper, overly long responses correlate with errors.
Custom validator function. Return False to red-flag (discard) the response.
How First-to-Ahead-by-k Works
Example with k=3:Match Strategies
Exact Match
Normalized Match
Structured Match
Custom Match Function
Red-Flagging
Red-flagging improves effective success rate by discarding confused responses:Length-Based Red-Flagging
Custom Validation
Accessing Voting Results
Advanced Examples
Data Validation Pipeline
Code Syntax Checker
Structured Data Extraction
Cost vs. Reliability Tradeoff
Higher k
More Reliable
- Higher confidence in consensus
- Better error bounds
- More samples needed
- Higher cost
Lower k
Faster/Cheaper
- Quicker convergence
- Fewer samples on average
- Lower cost
- Less strict consensus
k=2: Low-stakes, cost-sensitivek=3: Standard (good balance)k=5: High-stakes, critical accuracyk=7+: Mission-critical, zero-error tolerance
Performance Characteristics
Best Practices
Simple Worker Tasks
MAKER works best with simple, deterministic tasks where there’s a “correct” answer
Red-Flag Aggressively
Discard obvious errors early to improve effective success rate
Appropriate k
Match k to your reliability needs and cost constraints
Monitor Convergence
Track convergence rates to tune k and max_samples
Debugging
Enable detailed logging to see voting progress:Use Cases by Industry
- Finance: Transaction classification, fraud detection flags
- Healthcare: Medical coding, diagnosis categorization
- Legal: Document classification, clause identification
- Manufacturing: Quality control checks, defect classification
- E-commerce: Product categorization, review sentiment
- DevOps: Log analysis, error classification
Comparison with Other Patterns
| Feature | MAKER | Evaluator-Optimizer | Chain | Router |
|---|---|---|---|---|
| Error Reduction | ✅ Statistical | ✅ Feedback-driven | ❌ None | ❌ None |
| Reliability Guarantee | ✅ Mathematical | ❌ Heuristic | ❌ None | ❌ None |
| Task Type | Simple, deterministic | Complex, creative | Any | Any |
| Cost Model | Multiple samples | Multiple iterations | Single pass | Single pass |
| Best For | High-volume, zero-error | Quality content | Pipelines | Routing |
Related Patterns
- Evaluator-Optimizer - Quality through feedback (different approach)
- Parallel - Multiple agents without voting
- Chain - Sequential processing where MAKER can be a step
