Skip to main content
The browser tool provides persistent browser automation for web application testing and interaction. It uses Playwright to control a Chrome browser in headless mode.

Key Features

  • Persistent Sessions: Browser remains active across multiple tool calls
  • Multi-Tab Management: Open and manage multiple tabs simultaneously
  • JavaScript Execution: Run custom JavaScript in page context
  • Screenshot Capture: Visual feedback after each action
  • Console Log Access: Retrieve browser console messages
  • PDF Export: Save pages as PDF files

Actions

action
string
required
The action to perform. Available actions:
  • launch - Start browser at a URL
  • goto - Navigate to a URL
  • back - Go back in history
  • forward - Go forward in history
  • close - Close the browser
url
string
Required for launch, goto, and optionally for new_tab. The URL to navigate to. Must include protocol (http://, https://, file://).

Interaction Actions

action
string
required
Interaction actions:
  • click - Click at coordinates
  • double_click - Double-click at coordinates
  • hover - Hover over coordinates
  • type - Type text in focused field
  • press_key - Press a keyboard key
  • scroll_down - Scroll page down
  • scroll_up - Scroll page up
coordinate
string
Required for click, double_click, and hover. Format: “x,y” (e.g., “432,321”). Must target center of elements.
text
string
Required for type action. The text to type in the field.
key
string
Required for press_key action. Valid values:
  • Single characters: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’
  • Special keys: ‘Enter’, ‘Escape’, ‘ArrowLeft’, ‘ArrowRight’
  • Modifier keys: ‘Shift’, ‘Control’, ‘Alt’, ‘Meta’
  • Function keys: ‘F1’-‘F12’

Tab Management

action
string
required
Tab management actions:
  • new_tab - Open a new tab
  • switch_tab - Switch to a specific tab
  • close_tab - Close a specific tab
  • list_tabs - List all open tabs
tab_id
string
Required for switch_tab and close_tab. The ID of the tab to operate on (e.g., “tab_1”, “tab_2”).

Utility Actions

action
string
required
Utility actions:
  • execute_js - Execute JavaScript code
  • wait - Pause execution
  • save_pdf - Save page as PDF
  • get_console_logs - Retrieve console logs
  • view_source - View page source HTML
js_code
string
Required for execute_js. JavaScript code to execute in page context. The last evaluated expression is returned.
duration
string
Required for wait. Number of seconds to pause (can be fractional, e.g., 0.5).
file_path
string
Required for save_pdf. The file path where to save the PDF.
clear
boolean
For get_console_logs: whether to clear logs after retrieving. Default is false.

Response

screenshot
string
Base64 encoded PNG of the current page state
url
string
Current page URL
title
string
Current page title
viewport
object
Current browser viewport dimensions
tab_id
string
ID of the current active tab
all_tabs
object
Dictionary of all open tab IDs and their URLs
message
string
Status message about the action performed
js_result
any
Result of JavaScript execution (for execute_js action)
pdf_saved
string
File path of saved PDF (for save_pdf action)
console_logs
array
Array of console messages (for get_console_logs action). Limited to 50KB total and 200 most recent logs.
page_source
string
HTML source code (for view_source action). Large pages are truncated to 100KB.

Examples

Basic Web Browsing

# Launch browser at URL (creates tab_1)
browser_action(
    action="launch",
    url="https://example.com"
)

# Navigate to different URL
browser_action(
    action="goto",
    url="https://github.com"
)

# Scroll down to see more content
browser_action(
    action="scroll_down"
)

Form Interaction

# Click username field and type
browser_action(
    action="click",
    coordinate="400,200"
)

browser_action(
    action="type",
    text="[email protected]"
)

# Click password field and type
browser_action(
    action="click",
    coordinate="400,250"
)

browser_action(
    action="type",
    text="mypassword123"
)

# Press Enter to submit
browser_action(
    action="press_key",
    key="Enter"
)

JavaScript Execution

# Execute JavaScript to get page stats
browser_action(
    action="execute_js",
    js_code="""
const images = document.querySelectorAll('img');
const links = document.querySelectorAll('a');
({
    images: images.length,
    links: links.length,
    title: document.title
})
"""
)

Multi-Tab Workflow

# Open new tab with different URL
browser_action(
    action="new_tab",
    url="https://another-site.com"
)

# Wait for page load
browser_action(
    action="wait",
    duration=2.5
)

# Switch back to first tab
browser_action(
    action="switch_tab",
    tab_id="tab_1"
)

# Close the second tab when done
browser_action(
    action="close_tab",
    tab_id="tab_2"
)

Console Logs and Source

# Get console logs
browser_action(
    action="get_console_logs"
)

# View page source
browser_action(
    action="view_source"
)

# Save page as PDF
browser_action(
    action="save_pdf",
    file_path="/workspace/page.pdf"
)

Important Notes

Coordinate Accuracy: Click coordinates must be derived from the most recent screenshot. You MUST click on the center of the element, not the edge. Always verify clicks with the new screenshot.
Persistence: The browser remains active and maintains state until explicitly closed with the close action. This allows for multi-step workflows across multiple tool calls.
Resource Management: Always close tabs you no longer need and close the browser when done to free resources.

JavaScript Execution Best Practices

  • The last evaluated expression is automatically returned - no return statement needed
  • Code runs in browser page context with access to DOM
  • Object literals must be wrapped in parentheses when they are the final expression
  • Use await for async operations
  • Variables from tool context are NOT available

Browser Limitations

  • Runs in headless mode using Chrome engine
  • Must have at least one tab open at all times
  • Actions affect currently active tab unless tab_id is specified
  • Browser can operate concurrently with other tools

Build docs developers (and LLMs) love