Key Features
- Persistent Sessions: Browser remains active across multiple tool calls
- Multi-Tab Management: Open and manage multiple tabs simultaneously
- JavaScript Execution: Run custom JavaScript in page context
- Screenshot Capture: Visual feedback after each action
- Console Log Access: Retrieve browser console messages
- PDF Export: Save pages as PDF files
Actions
Navigation Actions
The action to perform. Available actions:
launch- Start browser at a URLgoto- Navigate to a URLback- Go back in historyforward- Go forward in historyclose- Close the browser
Required for
launch, goto, and optionally for new_tab. The URL to navigate to. Must include protocol (http://, https://, file://).Interaction Actions
Interaction actions:
click- Click at coordinatesdouble_click- Double-click at coordinateshover- Hover over coordinatestype- Type text in focused fieldpress_key- Press a keyboard keyscroll_down- Scroll page downscroll_up- Scroll page up
Required for
click, double_click, and hover. Format: “x,y” (e.g., “432,321”). Must target center of elements.Required for
type action. The text to type in the field.Required for
press_key action. Valid values:- Single characters: ‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’
- Special keys: ‘Enter’, ‘Escape’, ‘ArrowLeft’, ‘ArrowRight’
- Modifier keys: ‘Shift’, ‘Control’, ‘Alt’, ‘Meta’
- Function keys: ‘F1’-‘F12’
Tab Management
Tab management actions:
new_tab- Open a new tabswitch_tab- Switch to a specific tabclose_tab- Close a specific tablist_tabs- List all open tabs
Required for
switch_tab and close_tab. The ID of the tab to operate on (e.g., “tab_1”, “tab_2”).Utility Actions
Utility actions:
execute_js- Execute JavaScript codewait- Pause executionsave_pdf- Save page as PDFget_console_logs- Retrieve console logsview_source- View page source HTML
Required for
execute_js. JavaScript code to execute in page context. The last evaluated expression is returned.Required for
wait. Number of seconds to pause (can be fractional, e.g., 0.5).Required for
save_pdf. The file path where to save the PDF.For
get_console_logs: whether to clear logs after retrieving. Default is false.Response
Base64 encoded PNG of the current page state
Current page URL
Current page title
Current browser viewport dimensions
ID of the current active tab
Dictionary of all open tab IDs and their URLs
Status message about the action performed
Result of JavaScript execution (for execute_js action)
File path of saved PDF (for save_pdf action)
Array of console messages (for get_console_logs action). Limited to 50KB total and 200 most recent logs.
HTML source code (for view_source action). Large pages are truncated to 100KB.
Examples
Basic Web Browsing
Form Interaction
JavaScript Execution
Multi-Tab Workflow
Console Logs and Source
Important Notes
Persistence: The browser remains active and maintains state until explicitly closed with the
close action. This allows for multi-step workflows across multiple tool calls.JavaScript Execution Best Practices
- The last evaluated expression is automatically returned - no return statement needed
- Code runs in browser page context with access to DOM
- Object literals must be wrapped in parentheses when they are the final expression
- Use await for async operations
- Variables from tool context are NOT available
Browser Limitations
- Runs in headless mode using Chrome engine
- Must have at least one tab open at all times
- Actions affect currently active tab unless tab_id is specified
- Browser can operate concurrently with other tools