Skip to main content

BrowserEnv

Unified browser automation environment supporting both DOM-based (natural language) and CUA-based (vision + coordinates) control.

Overview

BrowserEnv provides two distinct browser automation modes:
  • DOM mode: Natural language operations via Stagehand SDK (act, observe, extract, navigate)
  • CUA mode: Vision-based primitives (click, scroll, type_text, screenshot)
Both modes integrate with Browserbase for cloud browser management and support local execution.

Installation

Install with browser support:
uv add 'verifiers[browser]'
Or when developing in the verifiers repo:
uv sync --extra browser
See Browser Examples for complete setup and usage.

Inheritance

Environment
└── MultiTurnEnv
    └── ToolEnv
        └── StatefulToolEnv
            └── BrowserEnv

Constructor

BrowserEnv(
    mode: Literal["dom", "cua"] = "dom",
    # Shared config
    project_id: str | None = None,
    browserbase_api_key_var: str = "BROWSERBASE_API_KEY",
    # DOM mode specific
    model_api_key_var: str = "MODEL_API_KEY",
    stagehand_model: str = "openai/gpt-4o-mini",
    proxy_model_to_stagehand: bool = False,
    # CUA mode specific
    use_sandbox: bool = True,
    server_url: str = "http://localhost:3000",
    env: Literal["LOCAL", "BROWSERBASE"] = "BROWSERBASE",
    viewport_width: int = 1024,
    viewport_height: int = 768,
    save_screenshots: bool = True,
    keep_recent_screenshots: int | None = 2,
    proxies: bool = False,
    advanced_stealth: bool = False,
    # CUA sandbox mode specific
    server_port: int = 3000,
    server_ready_timeout: int = 120,
    server_ready_poll_interval: float = 2.0,
    docker_image: str = "node:18-slim",
    cpu_cores: int = 2,
    memory_gb: int = 4,
    disk_size_gb: int = 10,
    sandbox_timeout_minutes: int = 60,
    sandbox_timeout_per_command_seconds: int = 60,
    use_binary: bool = True,
    # Pre-built image configuration
    use_prebuilt_image: bool = True,
    prebuilt_image: str = "deepdream19/cua-server:latest",
    # Error handling
    stop_errors: list[type[Exception]] | None = None,
    **kwargs
)

Parameters

Mode Selection

mode
Literal['dom', 'cua']
default:"dom"
Operating mode:
  • "dom": Natural language browser control via Stagehand SDK
  • "cua": Vision-based control using coordinate primitives

Shared Configuration

project_id
str | None
default:"None"
Browserbase project ID. Required when using Browserbase.
browserbase_api_key_var
str
default:"BROWSERBASE_API_KEY"
Environment variable name for Browserbase API key.

DOM Mode Parameters

model_api_key_var
str
default:"MODEL_API_KEY"
Environment variable name for model API key (OpenAI, Anthropic, etc.).
stagehand_model
str
default:"openai/gpt-4o-mini"
Model used by Stagehand for DOM understanding and action planning.
proxy_model_to_stagehand
bool
default:"False"
Whether to proxy model API calls through Stagehand.

CUA Mode Parameters

use_sandbox
bool
default:"True"
Auto-deploy CUA server to sandbox. If False, connects to server_url.
server_url
str
default:"http://localhost:3000"
CUA server URL when use_sandbox=False.
env
Literal['LOCAL', 'BROWSERBASE']
default:"BROWSERBASE"
Browser execution environment:
  • "BROWSERBASE": Cloud browsers via Browserbase
  • "LOCAL": Local browser execution
viewport_width
int
default:"1024"
Browser viewport width in pixels.
viewport_height
int
default:"768"
Browser viewport height in pixels.
save_screenshots
bool
default:"True"
Save screenshots to disk during execution.
keep_recent_screenshots
int | None
default:"2"
Number of recent screenshots to keep in message context. Set to None to keep all.
proxies
bool
default:"False"
Enable Browserbase proxies for IP rotation.
advanced_stealth
bool
default:"False"
Enable Browserbase Advanced Stealth mode for anti-bot detection.

CUA Sandbox Configuration

server_port
int
default:"3000"
Port for CUA server in sandbox.
server_ready_timeout
int
default:"120"
Timeout in seconds waiting for sandbox server to be ready.
server_ready_poll_interval
float
default:"2.0"
Poll interval in seconds for sandbox server health checks.
docker_image
str
default:"node:18-slim"
Docker image for sandbox (only used when use_prebuilt_image=False).
cpu_cores
int
default:"2"
CPU cores allocated to sandbox.
memory_gb
int
default:"4"
Memory in GB allocated to sandbox.
disk_size_gb
int
default:"10"
Disk size in GB for sandbox.
sandbox_timeout_minutes
int
default:"60"
Sandbox timeout in minutes.
sandbox_timeout_per_command_seconds
int
default:"60"
Per-command timeout in sandbox.
use_binary
bool
default:"True"
Use pre-built SEA binary when use_prebuilt_image=False. If False, installs from npm.

Pre-built Image Configuration

use_prebuilt_image
bool
default:"True"
Use pre-built Docker image for fastest startup. Recommended for production.
prebuilt_image
str
default:"deepdream19/cua-server:latest"
Docker image to use when use_prebuilt_image=True.

Error Handling

stop_errors
list[type[Exception]] | None
default:"None"
Exception types that should trigger cleanup. Defaults to [vf.SandboxError].
**kwargs
Any
Additional arguments passed to StatefulToolEnv.

DOM Mode Tools

def navigate(url: str) -> str
Navigate to a URL.

act

def act(instruction: str) -> str
Perform an action described in natural language (e.g., “click the login button”, “fill in the email field with [email protected]”).

observe

def observe(instruction: str) -> str
Find elements or information matching the instruction (e.g., “find all product cards”, “locate the search bar”).

extract

def extract(instruction: str, schema_json: str) -> str
Extract structured data from the page according to a JSON schema.

CUA Mode Tools

click

def click(x: int, y: int) -> str
Click at coordinates (x, y).

type_text

def type_text(text: str) -> str
Type text at the current cursor position.

scroll

def scroll(direction: str, amount: int = 500) -> str
Scroll the page. Direction can be “up” or “down”.

screenshot

def screenshot() -> str
Capture a screenshot of the current page. Returns path to screenshot file.

Key Methods

setup_state

async def setup_state(
    state: vf.State,
    **kwargs
) -> vf.State
Initialize browser session for this rollout. Delegates to mode-specific implementation (DOM or CUA).

update_tool_args

def update_tool_args(
    tool_name: str,
    tool_args: dict[str, Any],
    messages: vf.Messages,
    state: vf.State,
    **kwargs
) -> dict[str, Any]
Inject session state into tool calls. Delegates to mode-specific implementation.

get_prompt_messages

async def get_prompt_messages(
    state: vf.State
) -> vf.Messages
Get prompt messages. In CUA mode, filters screenshots to keep only recent ones based on keep_recent_screenshots.

cleanup_session

@vf.cleanup
async def cleanup_session(state: vf.State) -> None
Clean up browser session after rollout.

teardown

@vf.teardown
async def teardown() -> None
Clean up environment resources (e.g., sandbox servers in CUA mode).

Example Usage

DOM Mode

import verifiers as vf
from verifiers.envs.integrations.browser_env import BrowserEnv
from datasets import Dataset

def load_environment(
    project_id: str,
    max_turns: int = 10,
):
    dataset = Dataset.from_dict({
        "question": ["What is the headline on primeintellect.ai?"],
        "answer": ["The Open Superintelligence Stack"],
        "start_url": ["https://primeintellect.ai"],
    })
    
    def check_answer(completion: vf.Messages, answer: str) -> float:
        text = str(completion).lower()
        return 1.0 if answer.lower() in text else 0.0
    
    return BrowserEnv(
        mode="dom",
        project_id=project_id,
        dataset=dataset,
        rubric=vf.Rubric(check_answer),
        max_turns=max_turns,
        system_prompt="Use browser tools to find information on websites.",
    )

CUA Mode with Sandbox (Default)

import verifiers as vf
from verifiers.envs.integrations.browser_env import BrowserEnv
from datasets import Dataset

def load_environment(
    project_id: str,
    max_turns: int = 15,
):
    dataset = Dataset.from_dict({
        "question": ["Click the 'Get Started' button"],
        "start_url": ["https://example.com"],
    })
    
    return BrowserEnv(
        mode="cua",
        project_id=project_id,
        dataset=dataset,
        rubric=vf.Rubric(lambda completion: 1.0),
        max_turns=max_turns,
        # CUA sandbox is automatic by default
        use_sandbox=True,
        use_prebuilt_image=True,  # Fastest startup
        system_prompt="Use vision and coordinates to control the browser.",
    )

CUA Mode with Local Server

import verifiers as vf
from verifiers.envs.integrations.browser_env import BrowserEnv
from datasets import Dataset

def load_environment(
    project_id: str,
):
    dataset = Dataset.from_dict({
        "question": ["Find and click the search button"],
        "start_url": ["https://example.com"],
    })
    
    return BrowserEnv(
        mode="cua",
        project_id=project_id,
        use_sandbox=False,  # Use local server
        server_url="http://localhost:3000",
        dataset=dataset,
        rubric=vf.Rubric(lambda completion: 1.0),
        system_prompt="Control browser using screenshots and coordinates.",
    )

CUA Mode Execution Options

CUA mode supports three execution strategies (from fastest to most flexible):
env = BrowserEnv(
    mode="cua",
    use_prebuilt_image=True,  # Default
    prebuilt_image="deepdream19/cua-server:latest",
)
  • Fastest startup (no binary upload or npm install)
  • Uses pre-built deepdream19/cua-server:latest image
  • Best for production and rapid iteration

2. Binary Upload

env = BrowserEnv(
    mode="cua",
    use_prebuilt_image=False,
    use_binary=True,  # Default when use_prebuilt_image=False
)
  • Builds/uploads SEA binary to sandbox
  • Useful for custom server versions
  • Slower startup than pre-built image

3. Local Server

env = BrowserEnv(
    mode="cua",
    use_sandbox=False,
    server_url="http://localhost:3000",
)
  • Connect to manually started CUA server
  • Useful for local development and debugging
  • Requires running npm start in assets/templates/browserbase/cua/

Screenshot Management (CUA Mode)

CUA mode automatically manages screenshots in the message history:
  • save_screenshots=True: Screenshots saved to disk
  • keep_recent_screenshots=2: Only 2 most recent screenshots kept in context
  • Older screenshots filtered out via get_prompt_messages() to reduce token usage
env = BrowserEnv(
    mode="cua",
    save_screenshots=True,
    keep_recent_screenshots=3,  # Keep last 3 screenshots
)

See Also

Build docs developers (and LLMs) love