RolloutGatewayMixin
Opt-in mixin that replacesCliAgentEnv’s client-side interception with a server-side gateway path, allowing agents to communicate directly with prime-rl’s rollout gateway.
Overview
When the gateway is active, agents talk directly to prime-rl’s rollout gateway through a Prime Tunnel. The environment only manages sandbox lifecycle and fetches the trajectory after completion. When inactive, it falls through toCliAgentEnv’s standard interception path.
Key differences from standard CliAgentEnv:
- Agent makes API calls directly to the gateway server (not intercepted by local proxy)
- Environment registers/unregisters rollouts with the gateway
- Trajectory is fetched from the gateway after agent completion
- Requires prime-rl’s rollout gateway to be running
Method Resolution Order (MRO)
CliAgentEnv in the inheritance chain.
Usage
Basic Setup
Disabling Gateway Mode
Attributes
Toggle gateway mode. When
True, uses server-side gateway. When False, falls through to CliAgentEnv interception.Port where the rollout gateway server is listening.
Methods
init_gateway
__init__ when use_gateway=True.
Port for the rollout gateway server.
HTTP timeout for gateway requests (6 hours by default).
- HTTP client with configured timeout
- Tunnel management dict
- Tunnel lock for thread-safe access
- Tunnel monitor task reference
init_interception
CliAgentEnv.init_interception(). Only calls parent implementation when use_gateway=False.
register_rollout
Current rollout state.
- Model name
- Sampling parameters
- Max turns
- Max sequence length
POST /v1/rollouts/{rollout_id}/register
unregister_rollout
Current rollout state.
POST /v1/rollouts/{rollout_id}/unregister
fetch_trajectory
Current rollout state. Updated with trajectory data.
trajectory: List of conversation turnsprompt: Final prompt messagescompletion: Final completion messagesis_truncated: Whether any turn was truncated
GET /v1/rollouts/{rollout_id}/trajectory
build_env_vars
OPENAI_BASE_URL from rollout_base_url in gateway mode.
Current rollout state.
OPENAI_BASE_URL: Points to gateway rollout endpointOPENAI_MODEL: Model name from stateOPENAI_TIMEOUT: Set to “600”OPENAI_REQUEST_TIMEOUT: Set to “600”HTTPX_TIMEOUT: Set to “600”- Plus any variables from
self.environment_vars
get_gateway_tunnel_url
Local address for the tunnel. Required when starting first tunnel or when multiple tunnels are active.
"https://xxx.prime-tunnel.com").
Behavior:
- Creates new tunnel if none exists for
local_addr - Reuses existing tunnel if alive
- Restarts dead tunnels automatically
- Starts health monitor on first tunnel creation
start_agent
wait_for_agent_completion).
Current rollout state.
background_job: Background job handleagent_start_time: Start timestampagent_completed: Set toFalse
poll_job_completion
Current rollout state.
Prime Sandbox ID.
Background job handle from sandbox client.
agent_exit_code: Process exit codeagent_stdout: Captured stdoutagent_stderr: Captured stderr
TunnelErrorif tunnel dies during polling
wait_for_agent_completion
Current rollout state.
agent_completed: Set toTruewhen doneagent_timed_out: Set toTrueif timeout exceeded
rollout
use_gateway=True, orchestrates gateway-based rollout. Otherwise, delegates to parent CliAgentEnv.rollout().
Rollout input data.
LLM client (base URL used to determine gateway URL).
Model identifier.
Sampling parameters.
- Initialize state
- Register rollout with gateway
- Resolve tunnel local address
- Start or reuse Prime Tunnel
- Create sandbox with
OPENAI_BASE_URLpointing to gateway - Start agent
- Wait for agent completion
- Fetch trajectory from gateway
- Cleanup (unregister, destroy sandbox)
teardown_gateway
@vf.teardown to run automatically.
Cleans up:
- HTTP client connection
- All active Prime Tunnels
- Tunnel health monitor task
State Keys
Gateway mode adds these state keys:Unique identifier for the rollout (format:
"rollout_{uuid}").Base URL of the gateway server (derived from client base URL).
Full rollout endpoint URL:
{tunnel_url}/v1/rollouts/{rollout_id}.Prime Tunnel URL.
Local address for tunnel connection.
Prime Tunnel ID for debugging.
CliAgentEnv:
Prime Sandbox ID.
background_job
Background job handle.
Whether agent process finished.
Agent process exit code.
Captured stdout.
Captured stderr.
Whether agent exceeded timeout.
Tunnel Health Monitoring
The mixin automatically monitors tunnel health in the background:- Runs every 30 seconds by default
- Detects dead tunnels via
tunnel.is_running - Automatically restarts dead tunnels
- Logs frpc output for debugging
- Started lazily on first tunnel creation
- Cancelled on teardown
Error Handling
Tunnel Errors
Gateway Errors
Cleanup Guarantees
The mixin ensures cleanup even on errors:- Unregister rollout (if registered)
- Destroy sandbox (if created)
- Errors during cleanup are logged but don’t raise
- Any cleanup error is captured in
state["error"]
Logging
The mixin provides detailed structured logging:stage=start: Rollout initiatedstage=register_rollout: Gateway registrationstage=resolve_tunnel_local_addr: Tunnel address resolutionstage=start_tunnel: Tunnel creationstage=create_sandbox: Sandbox provisioningstage=start_agent: Agent launchstage=wait_for_agent_completion: Agent monitoringstage=fetch_trajectory: Trajectory retrievalstage=tunnel_died: Tunnel failurestage=agent_completed: Agent exitstage=finish: Rollout completion
Advanced Example
When to Use Gateway Mode
Use gateway when:- Running distributed rollouts with prime-rl’s gateway server
- Need server-side trajectory management
- Want centralized rollout coordination
- Prefer gateway-managed model inference
- Running local rollouts without gateway infrastructure
- Need client-side interception for debugging
- Want simpler setup without gateway dependencies
See Also
- CliAgentEnv - Parent class with standard interception
- HarborEnv - Harbor benchmark implementation
- MultiTurnEnv - Base multi-turn environment
- State - State type reference