AgentExecutionEngine is rLLM’s low-level, high-performance orchestrator for agent-environment interactions. It manages parallel trajectory generation with full asynchronous execution, making it ideal for batch inference and RL training.
Overview
The execution engine handles:- Parallel execution: Manages multiple agent-environment pairs concurrently
- Asynchronous orchestration: Fully async LLM inference and environment steps
- Token-level tracking: Captures prompt/response tokens and log probabilities for training
- Retry logic: Automatic retry on failures with configurable limits
- Multiple backends: Supports OpenAI API, vLLM (via verl), and Tinker
Architecture
The engine operates with a pool of agent-environment pairs:Key Components
- Agent-Environment Pairs: Each pair operates independently and asynchronously
- Rollout Engine: Handles LLM inference (OpenAI, verl, or Tinker backend)
- Thread Pool Executor: Manages blocking environment operations
- Task Queue: Distributes tasks across available agent-environment pairs
Initialization
Basic Setup
Configuration Parameters
Agent and Environment Configuration
Agent and Environment Configuration
- agent_class: Your custom agent class (must inherit from
BaseAgent) - env_class: Your custom environment class (must inherit from
BaseEnv) - agent_args: Dictionary of arguments passed to agent constructor
- env_args: Dictionary of arguments passed to
env_class.from_dict()
Backend Configuration
Backend Configuration
- engine_name: Rollout backend -
"openai"(API),"verl"(vLLM), or"tinker"(Megatron) - tokenizer: HuggingFace tokenizer for the model
- rollout_engine_args: Backend-specific configuration (base_url, api_key, etc.)
- sampling_params: LLM sampling parameters (temperature, top_p, etc.)
Execution Configuration
Execution Configuration
- n_parallel_agents: Number of concurrent agent-environment pairs (default: 128)
- max_steps: Maximum steps per trajectory (default: 5)
- max_response_length: Maximum total response tokens per trajectory (default: 8192)
- max_prompt_length: Maximum prompt tokens (default: 1024)
- enforce_max_prompt_length: Apply max_prompt check per-step vs per-trajectory (default: False)
Performance Configuration
Performance Configuration
- trajectory_timeout: Timeout per trajectory in seconds (default: infinity)
- retry_limit: Number of retry attempts on failure (default: 3)
- max_workers: Thread pool size for environment operations (default: 64)
- gamma: Discount factor for MC returns (default: 0.2)
- overlong_filter: Mask out overlong trajectories (default: False)
Usage Patterns
Batch Inference with execute_tasks
The primary method for batch trajectory generation:Pool Initialization
Engine creates
n_parallel_agents agent-environment pairs from agent_class and env_classParallel Execution
Each pair executes its task independently:
- Environment resets with task data via
env_class.from_dict({**env_args, **task}) - Agent-environment interaction loop runs until done or max_steps
- Results yield as trajectories complete
Training Mode with trajectory_generator
For RL training, use the generator pattern:- “Text”: Returns
Trajectoryobjects (default) - “Token”: Returns tokenized data for training
- “Conversation”: Returns chat completion messages
- “Step”: Returns individual steps with metadata
Trajectory Generation Flow
Here’s what happens during a single trajectory generation: Source code: rllm/engine/agent_execution_engine.py:180-429Advanced Features
Termination Handling
The engine handles multiple termination conditions:Retry Logic
Automatic retry on failures:Custom Rollout Engines
The execution engine supports multiple backends:- OpenAI API
- verl (vLLM)
- Tinker (Megatron)
Token Assembly for Training
When generating trajectories for training, the engine assembles token sequences:Complete Example
Here’s a complete example using the FrozenLake environment:Performance Considerations
Comparison with WorkflowEngine
| Feature | AgentExecutionEngine | AgentWorkflowEngine |
|---|---|---|
| Use Case | Simple agent-env interactions | Complex multi-agent workflows |
| Abstraction | Low-level, direct control | High-level, workflow-based |
| Multi-agent | Single agent per trajectory | Multiple agents per episode |
| Flexibility | Limited to agent-env loop | Arbitrary orchestration logic |
| Performance | Slightly faster | Small overhead |
| Recommended For | Training, batch inference | Complex reasoning, tool use |
For most use cases, especially during RL training,
AgentExecutionEngine provides the best performance. Use AgentWorkflowEngine when you need complex multi-agent orchestration.Next Steps
Workflow Engine
Learn about complex multi-agent workflows
Training
Use the execution engine for RL training
Examples
See complete examples
API Reference
Detailed API documentation