Skip to main content
Workers can launch and control a headless Chrome browser for web automation, testing, and scraping. The browser tool uses an accessibility-tree based approach for LLM-friendly element addressing.

Configuration

Configure browser behavior in agent.toml:
agent.toml
[browser]
headless = true
evaluate_enabled = false
executable_path = "/usr/bin/chromium"  # Optional
headless
boolean
Run Chrome in headless mode (default: true)
evaluate_enabled
boolean
Allow JavaScript evaluation via the evaluate action (default: false)
executable_path
string
Path to Chrome/Chromium binary. Auto-detected if not set.
JavaScript evaluation is disabled by default for security. Only enable if you trust the worker’s task.

Browser Actions

The browser tool supports these actions:
Start the browser. Must be called before any other action.
{"action": "launch"}
Open a new tab.
{"action": "open", "url": "https://example.com"}
Returns the new tab’s target ID.
List all open tabs.
{"action": "tabs"}
Returns tab metadata (target ID, title, URL, active state).
Switch to a different tab.
{"action": "focus", "target_id": "tab-abc-123"}
Close a tab.
{"action": "close_tab", "target_id": "tab-abc-123"}
Omit target_id to close the active tab.
Get an accessibility tree with element refs.
{"action": "snapshot"}
Returns up to 200 interactive elements with refs like e1, e2, etc.
Interact with an element.
{
  "action": "act",
  "element_ref": "e3",
  "act_kind": "click"
}
See Element Interactions for details.
Capture the page or an element.
{"action": "screenshot", "full_page": true}
Saves to screenshot_dir with timestamp filename.
Run JavaScript (requires evaluate_enabled = true).
{
  "action": "evaluate",
  "script": "document.title"
}
Returns the script’s result as JSON.
Get the page’s HTML.
{"action": "content"}
Returns HTML, truncated to 100KB if needed.
Shut down the browser.
{"action": "close"}

Element Interactions

Elements are addressed by refs (e1, e2, …) from the accessibility tree:
{
  "action": "act",
  "element_ref": "e5",
  "act_kind": "click"
}

Accessibility Tree Snapshot

The snapshot action returns interactive elements:
{
  "success": true,
  "message": "42 interactive element(s) found",
  "title": "Example Page",
  "url": "https://example.com",
  "elements": [
    {
      "ref_id": "e1",
      "role": "button",
      "name": "Sign In",
      "description": "Submit login form"
    },
    {
      "ref_id": "e2",
      "role": "textbox",
      "name": "Email",
      "value": ""
    },
    {
      "ref_id": "e3",
      "role": "link",
      "name": "Forgot password?",
      "description": null
    }
  ]
}
ref_id
string
Short identifier for use in act calls (e.g., e1, e2)
role
string
ARIA role: button, link, textbox, checkbox, etc.
name
string
Accessible name (usually visible text or aria-label)
description
string
Accessible description (aria-description or title)
value
string
Current value for inputs, sliders, etc.
Only interactive roles are included:
src/tools/browser.rs
const INTERACTIVE_ROLES: &[&str] = &[
    "button", "checkbox", "combobox", "link", "listbox", "menu",
    "menubar", "menuitem", "menuitemcheckbox", "menuitemradio",
    "option", "radio", "scrollbar", "searchbox", "slider",
    "spinbutton", "switch", "tab", "textbox", "treeitem",
];
Max 200 elements per snapshot to keep output manageable.

Workflow Example

1

Launch

{"action": "launch"}
2

Navigate

{"action": "navigate", "url": "https://example.com/login"}
3

Snapshot

{"action": "snapshot"}
Find textbox with name “Email” → e2
4

Type Email

{
  "action": "act",
  "element_ref": "e2",
  "act_kind": "type",
  "text": "[email protected]"
}
5

Type Password

Snapshot again if needed, or use next textbox ref e3:
{
  "action": "act",
  "element_ref": "e3",
  "act_kind": "type",
  "text": "password123"
}
6

Click Submit

{
  "action": "act",
  "element_ref": "e1",
  "act_kind": "click"
}
7

Screenshot

{"action": "screenshot"}
Returns path to saved screenshot.
8

Close

{"action": "close"}

Security

URL validation blocks private networks:
  • Loopback (127.0.0.0/8, ::1)
  • Private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)
  • Link-local (169.254.0.0/16, fe80::/10)
  • Cloud metadata (169.254.169.254, metadata.google.internal)
This prevents SSRF attacks. Only http and https schemes are allowed.
src/tools/browser.rs
fn validate_url(url: &str) -> Result<(), BrowserError> {
    let parsed = Url::parse(url)?;
    match parsed.scheme() {
        "http" | "https" => {}
        other => return Err(BrowserError::new(
            format!("scheme '{other}' is not allowed")
        )),
    }
    // Check for blocked IPs...
}
JavaScript evaluation is off by default. Enable only for trusted tasks.

Screenshots

Screenshots are saved to screenshot_dir with timestamped names:
screenshot_20260228_143052_123.png
{"action": "screenshot"}
Captures visible area only.
Screenshot paths are returned in the tool output for reference.

Performance Notes

The browser stays open across multiple tool calls within a worker. Launch once, reuse.
Accessibility tree extraction takes ~100ms. Use liberally to understand the page.
Element refs are cleared when you navigate to a new page. Take a fresh snapshot after navigation.
The browser runs in a separate process. Even if compromised, it’s isolated from Spacebot.

Debugging

Enable headed mode to watch the browser:
[browser]
headless = false
The browser window opens at 1280x900 and stays visible during worker execution. Logs include CDP errors:
[ERROR] browser: navigation failed: net::ERR_NAME_NOT_RESOLVED

Best Practices

Snapshot Before Interact

Always take a snapshot to discover elements. Don’t guess element refs.

Use Names, Not Positions

Find elements by name/role, not by ref number. Refs change between snapshots.

Handle Missing Elements

Check snapshot results before acting. Elements may not exist (dynamic content, slow load).

Close When Done

Call {"action": "close"} at the end to free resources.

Build docs developers (and LLMs) love