Browser Automation

Workers can launch and control a headless Chrome browser for web automation, testing, and scraping. The browser tool uses an accessibility-tree based approach for LLM-friendly element addressing.

Configuration

Configure browser behavior in agent.toml:

agent.toml

[browser]
headless = true
evaluate_enabled = false
executable_path = "/usr/bin/chromium"  # Optional

headless

boolean

Run Chrome in headless mode (default: true)

evaluate_enabled

boolean

Allow JavaScript evaluation via the evaluate action (default: false)

executable_path

string

Path to Chrome/Chromium binary. Auto-detected if not set.

JavaScript evaluation is disabled by default for security. Only enable if you trust the worker’s task.

Browser Actions

The browser tool supports these actions:

launch

Start the browser. Must be called before any other action.

{"action": "launch"}

navigate

Go to a URL.

{"action": "navigate", "url": "https://example.com"}

Returns page title and URL after navigation.

open

Open a new tab.

{"action": "open", "url": "https://example.com"}

Returns the new tab’s target ID.

tabs

List all open tabs.

{"action": "tabs"}

Returns tab metadata (target ID, title, URL, active state).

focus

Switch to a different tab.

{"action": "focus", "target_id": "tab-abc-123"}

close_tab

Close a tab.

{"action": "close_tab", "target_id": "tab-abc-123"}

Omit target_id to close the active tab.

snapshot

Get an accessibility tree with element refs.

{"action": "snapshot"}

Returns up to 200 interactive elements with refs like e1, e2, etc.

act

Interact with an element.

{
  "action": "act",
  "element_ref": "e3",
  "act_kind": "click"
}

See Element Interactions for details.

screenshot

Capture the page or an element.

{"action": "screenshot", "full_page": true}

Saves to screenshot_dir with timestamp filename.

evaluate

Run JavaScript (requires evaluate_enabled = true).

{
  "action": "evaluate",
  "script": "document.title"
}

Returns the script’s result as JSON.

content

Get the page’s HTML.

{"action": "content"}

Returns HTML, truncated to 100KB if needed.

Shut down the browser.

{"action": "close"}

Element Interactions

Elements are addressed by refs (e1, e2, …) from the accessibility tree:

{
  "action": "act",
  "element_ref": "e5",
  "act_kind": "click"
}

{
  "action": "act",
  "element_ref": "e12",
  "act_kind": "type",
  "text": "[email protected]"
}

{
  "action": "act",
  "act_kind": "press_key",
  "key": "Enter"
}

Omit element_ref to send the key to the page.

{
  "action": "act",
  "element_ref": "e8",
  "act_kind": "hover"
}

{
  "action": "act",
  "element_ref": "e20",
  "act_kind": "scroll_into_view"
}

{
  "action": "act",
  "element_ref": "e3",
  "act_kind": "focus"
}

Accessibility Tree Snapshot

The snapshot action returns interactive elements:

{
  "success": true,
  "message": "42 interactive element(s) found",
  "title": "Example Page",
  "url": "https://example.com",
  "elements": [
    {
      "ref_id": "e1",
      "role": "button",
      "name": "Sign In",
      "description": "Submit login form"
    },
    {
      "ref_id": "e2",
      "role": "textbox",
      "name": "Email",
      "value": ""
    },
    {
      "ref_id": "e3",
      "role": "link",
      "name": "Forgot password?",
      "description": null
    }
  ]
}

ref_id

string

Short identifier for use in act calls (e.g., e1, e2)

role

string

ARIA role: button, link, textbox, checkbox, etc.

name

string

Accessible name (usually visible text or aria-label)

description

string

Accessible description (aria-description or title)

value

string

Current value for inputs, sliders, etc.

Only interactive roles are included:

src/tools/browser.rs

const INTERACTIVE_ROLES: &[&str] = &[
    "button", "checkbox", "combobox", "link", "listbox", "menu",
    "menubar", "menuitem", "menuitemcheckbox", "menuitemradio",
    "option", "radio", "scrollbar", "searchbox", "slider",
    "spinbutton", "switch", "tab", "textbox", "treeitem",
];

Max 200 elements per snapshot to keep output manageable.

Workflow Example

Launch

{"action": "launch"}

Navigate

{"action": "navigate", "url": "https://example.com/login"}

Snapshot

{"action": "snapshot"}

Find textbox with name “Email” → e2

Type Email

{
  "action": "act",
  "element_ref": "e2",
  "act_kind": "type",
  "text": "[email protected]"
}

Type Password

Snapshot again if needed, or use next textbox ref e3:

{
  "action": "act",
  "element_ref": "e3",
  "act_kind": "type",
  "text": "password123"
}

Click Submit

{
  "action": "act",
  "element_ref": "e1",
  "act_kind": "click"
}

Screenshot

{"action": "screenshot"}

Returns path to saved screenshot.

{"action": "close"}

Security

URL validation blocks private networks:

Loopback (127.0.0.0/8, ::1)
Private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)
Link-local (169.254.0.0/16, fe80::/10)
Cloud metadata (169.254.169.254, metadata.google.internal)

This prevents SSRF attacks. Only http and https schemes are allowed.

src/tools/browser.rs

fn validate_url(url: &str) -> Result<(), BrowserError> {
    let parsed = Url::parse(url)?;
    match parsed.scheme() {
        "http" | "https" => {}
        other => return Err(BrowserError::new(
            format!("scheme '{other}' is not allowed")
        )),
    }
    // Check for blocked IPs...
}

JavaScript evaluation is off by default. Enable only for trusted tasks.

Screenshots

Screenshots are saved to screenshot_dir with timestamped names:

screenshot_20260228_143052_123.png

Viewport
Full Page
Element

{"action": "screenshot"}

Captures visible area only.

{"action": "screenshot", "full_page": true}

Scrolls and stitches entire page.

{
  "action": "screenshot",
  "element_ref": "e5"
}

Captures just the element’s bounding box.

Screenshot paths are returned in the tool output for reference.

Performance Notes

Browser state persists

The browser stays open across multiple tool calls within a worker. Launch once, reuse.

Snapshots are fast

Accessibility tree extraction takes ~100ms. Use liberally to understand the page.

Element refs expire on navigation

No sandbox escape

The browser runs in a separate process. Even if compromised, it’s isolated from Spacebot.

Debugging

Enable headed mode to watch the browser:

[browser]
headless = false

The browser window opens at 1280x900 and stays visible during worker execution. Logs include CDP errors:

[ERROR] browser: navigation failed: net::ERR_NAME_NOT_RESOLVED

Best Practices

Snapshot Before Interact

Always take a snapshot to discover elements. Don’t guess element refs.

Use Names, Not Positions

Find elements by name/role, not by ref number. Refs change between snapshots.

Handle Missing Elements

Check snapshot results before acting. Elements may not exist (dynamic content, slow load).

Close When Done

Call {"action": "close"} at the end to free resources.

Getting Started

Core Concepts

Features

Configuration

Messaging

Deployment

Browser Automation

Configuration

Browser Actions

Element Interactions

Accessibility Tree Snapshot

Workflow Example

Security

Screenshots

Performance Notes

Debugging

Best Practices

Snapshot Before Interact

Use Names, Not Positions

Handle Missing Elements

Close When Done

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Features

Configuration

Messaging

Deployment

​Configuration

​Browser Actions

​Element Interactions

​Accessibility Tree Snapshot

​Workflow Example

​Security

​Screenshots

​Performance Notes

​Debugging

​Best Practices

Snapshot Before Interact

Use Names, Not Positions

Handle Missing Elements

Close When Done

Build docs developers (and LLMs) love

Configuration

Browser Actions

Element Interactions

Accessibility Tree Snapshot

Workflow Example

Security

Screenshots

Performance Notes

Debugging

Best Practices