Overview
Multi-agent workflows enable complex interactions where multiple agents collaborate, compete, or verify each other’s work. This guide covers implementing multi-agent patterns using rLLM’s workflow system.Why Multi-Agent?
Use cases:- Solver-Judge: One agent solves, another verifies
- Debate: Multiple agents argue different positions
- Collaborative: Agents work together on subtasks
- Ensemble: Multiple solutions, select best one
- Improved accuracy through verification
- More diverse solutions
- Natural curriculum learning
- Better credit assignment
Workflow Architecture
Multi-agent systems useWorkflow instead of Agent + Environment:
rllm/workflows/workflow.py:32.
Solver-Judge Pattern
The most common multi-agent pattern: one agent generates solutions, another verifies them.Implementation
Generate multiple solutions
Create diverse candidates:From
examples/solver_judge/solver_judge_flow.py:29.Define Judge agent
Evaluates and selects solutions:From
examples/solver_judge/solver_judge_flow.py:41.Create judge prompt
Format solutions for evaluation:From
examples/solver_judge/solver_judge_flow.py:71.Complete Solver-Judge Example
examples/solver_judge/solver_judge_flow.py:1.
Training Multi-Agent Workflows
Advanced Patterns
Different Models for Different Roles
Multi-Turn Collaboration
Ensemble Voting
Trajectory Grouping
For advanced advantage computation, group trajectories by role:Configuration
Enable Workflow Mode
Set Number of Solutions
Adjust Timeouts
Best Practices
- Start with 2 solutions: Balance diversity and compute cost
- Use async for parallelism: Generate solutions concurrently
- Assign rewards to all trajectories: Even incorrect ones for learning
- Track per-role metrics: Monitor solver and judge performance separately
- Use different prompts: Solver should explore, judge should verify
- Handle parsing errors: Return empty string rather than crashing
- Test components separately: Debug solver and judge independently
Common Issues
Judge Always Selects First Solution
- Improve judge prompt with clearer criteria
- Add few-shot examples to judge prompt
- Increase judge model size (use stronger model)
- Randomize solution order in prompt
Solver Solutions Too Similar
- Increase
temperaturein generation config - Use
top_psampling instead of greedy - Add diverse few-shot examples
- Modify prompt to encourage different approaches
Training Not Converging
- Check that rewards are being assigned to all trajectories
- Verify metrics show meaningful differences
- Reduce
n_solutionsif variance is too high - Ensure judge is learning from solver improvements
The solver-judge pattern creates a natural curriculum: as solvers improve, the judge learns to distinguish increasingly subtle differences.
Next Steps
- Build custom agents for specialized roles
- Implement reward functions for each agent type
- Setup distributed training to scale multi-agent experiments