LLM Providers

Overview

Browser Use supports multiple LLM providers through a unified BaseChatModel interface. Each provider offers different models with varying capabilities, speeds, and costs. The agent automatically configures optimal settings based on your chosen model.

BaseChatModel Interface

All LLM providers implement the BaseChatModel protocol (line 18 in llm/base.py):

class BaseChatModel(Protocol):
    model: str  # Model identifier
    provider: str  # Provider name
    
    async def ainvoke(
        self,
        messages: list[BaseMessage],
        output_format: type[T] | None = None,
        **kwargs
    ) -> ChatInvokeCompletion[T | str]:
        """Call the LLM with messages and optional structured output."""

The unified interface means you can switch providers without changing your agent code - just swap the LLM instance.

Recommended: ChatBrowserUse

The ChatBrowserUse model is specifically optimized for browser automation tasks:

from browser_use import Agent, ChatBrowserUse

agent = Agent(
    task="Your browser automation task",
    llm=ChatBrowserUse(),
)

Why ChatBrowserUse?

Fastest: 3-5x faster task completion
Cheapest: Lowest token cost per task
Most accurate: Built specifically for Browser Use
Free credits: Get $10 to start at cloud.browser-use.com

Setup:

# Add to .env
BROWSER_USE_API_KEY=your_key_here

Model Selection:

# Default (recommended)
llm = ChatBrowserUse()

# Specific model
llm = ChatBrowserUse(model='browser-use/gpt-4.1-mini')

Supported Providers

OpenAI

Wide range of models including GPT-4 and GPT-3.5:

from browser_use import ChatOpenAI

# GPT-4.1 Mini (recommended for cost/performance)
llm = ChatOpenAI(model='gpt-4.1-mini')

# GPT-4.1 (most capable)
llm = ChatOpenAI(model='gpt-4.1')

# O3 Mini (reasoning model)
llm = ChatOpenAI(model='o3-mini')

Setup:

# .env
OPENAI_API_KEY=sk-...

Model Options:

gpt-4.1-mini - Fast, cost-effective
gpt-4.1 - Most capable, slower
o3-mini - Advanced reasoning
gpt-3.5-turbo - Legacy, cheap

Auto-Configuration:

O3 models: 90s LLM timeout
Default: 75s timeout

Anthropic

Claude models with excellent reasoning:

from browser_use import ChatAnthropic

# Claude Sonnet 4 (recommended)
llm = ChatAnthropic(model='claude-sonnet-4-0')

# Claude Opus 4 (most capable)
llm = ChatAnthropic(model='claude-opus-4-0')

# With temperature
llm = ChatAnthropic(
    model='claude-sonnet-4-0',
    temperature=0.0,  # Deterministic output
)

Setup:

# .env
ANTHROPIC_API_KEY=sk-ant-...

Model Options:

claude-sonnet-4-0 - Best balance, coordinate clicking
claude-opus-4-0 - Most capable, coordinate clicking
claude-sonnet-3-5 - Previous generation

Auto-Configuration:

Screenshot size: 1400x850 (auto-optimized)
LLM timeout: 90s
Coordinate clicking: Enabled for Sonnet 4 & Opus 4

Claude models excel at visual understanding and complex multi-step reasoning.

Google

Gemini models with fast inference:

from browser_use import ChatGoogle

# Gemini 2.0 Flash (fastest)
llm = ChatGoogle(model='gemini-flash-latest')

# Gemini 3 Pro (experimental, coordinate clicking)
llm = ChatGoogle(model='gemini-3-pro-exp')

# With safety settings
llm = ChatGoogle(
    model='gemini-flash-latest',
    safety_settings={
        'HARM_CATEGORY_HARASSMENT': 'BLOCK_NONE',
    }
)

Setup:

# .env
GOOGLE_API_KEY=AIza...
# Get free key: https://aistudio.google.com/app/apikey

Model Options:

gemini-flash-latest - Fast, cost-effective
gemini-3-pro-exp - Experimental, coordinate clicking
gemini-pro-latest - Stable, capable

Auto-Configuration:

Gemini 3 Pro: 90s timeout, coordinate clicking
Other Gemini: 75s timeout

DeepSeek

Cost-effective models with good performance:

from browser_use import ChatDeepSeek

llm = ChatDeepSeek(model='deepseek-chat')

# With custom endpoint
llm = ChatDeepSeek(
    model='deepseek-chat',
    base_url='https://api.deepseek.com',
)

Setup:

# .env
DEEPSEEK_API_KEY=sk-...

Auto-Configuration:

LLM timeout: 90s
Vision: Disabled (not yet supported)

DeepSeek models don’t support vision yet. The agent automatically sets use_vision=False.

Groq

Ultra-fast inference with LPU:

from browser_use import ChatGroq

llm = ChatGroq(model='llama-3.3-70b-versatile')

Setup:

# .env
GROQ_API_KEY=gsk_...

Auto-Configuration:

LLM timeout: 30s (fast inference)

XAI (Grok)

XAI’s Grok models:

from browser_use.llm.xai import ChatXAI

# Grok 2 (vision supported)
llm = ChatXAI(model='grok-2-latest')

# Grok 4 (vision supported)
llm = ChatXAI(model='grok-4-latest')

Setup:

# .env
XAI_API_KEY=xai-...

Vision Support:

✅ Grok 2, Grok 4
❌ Grok 3, Grok Code (auto-disabled)

AWS Bedrock

Use models through AWS:

from browser_use import ChatBedrock

llm = ChatBedrock(
    model='anthropic.claude-sonnet-4-0',
    region='us-west-2',
)

Setup:

# .env
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_DEFAULT_REGION=us-west-2

Azure OpenAI

OpenAI models through Azure:

from browser_use import ChatAzureOpenAI

llm = ChatAzureOpenAI(
    model='gpt-4.1',
    deployment_name='gpt-4-deployment',
    api_version='2024-02-15-preview',
)

Setup:

# .env
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com

Ollama

Local models with Ollama:

from browser_use import ChatOllama

llm = ChatOllama(
    model='llama3.1',
    base_url='http://localhost:11434',
)

Setup:

# Install Ollama and pull model
ollama pull llama3.1

Local models may require more patience and fine-tuning for browser automation tasks.

Model Configuration

Temperature

Control randomness in model outputs:

# Deterministic (recommended for automation)
llm = ChatAnthropic(model='claude-sonnet-4-0', temperature=0.0)

# More creative (for content generation)
llm = ChatOpenAI(model='gpt-4.1', temperature=0.7)

Recommendations:

0.0: Deterministic, repeatable (best for automation)
0.3-0.5: Slight variation, still focused
0.7-1.0: Creative, diverse (not recommended for tasks)

Timeout Configuration

The agent automatically sets timeouts based on model:

# Override auto-detection
agent = Agent(
    task="Your task",
    llm=llm,
    llm_timeout=120,  # 2 minutes for slow models
)

Auto-Detected Timeouts:

Gemini 3 Pro: 90s
Groq: 30s (fast inference)
O3, Claude, DeepSeek: 90s
Default: 75s

Vision Configuration

Control how the agent uses vision:

agent = Agent(
    task="Visual task",
    llm=llm,
    use_vision=True,  # Always include screenshots
    # use_vision=False,  # Never include screenshots
    # use_vision='auto',  # Include screenshot tool, use when requested
    vision_detail_level='high',  # 'low', 'high', 'auto'
)

Vision Modes:

True: Always include screenshots in every step
False: Never include screenshots
'auto': Include screenshot tool, agent requests when needed

Detail Levels:

'high': Full resolution (slower, more accurate)
'low': Lower resolution (faster, less detail)
'auto': Model decides based on content

Screenshot Optimization

Resize screenshots for faster processing:

agent = Agent(
    task="Visual task",
    llm=llm,
    llm_screenshot_size=(1400, 850),  # Resize before sending to LLM
)

Auto-Configuration:

Claude Sonnet models: (1400, 850) automatically set
Other models: Original viewport size

Screenshot resizing reduces token costs and speeds up inference. Coordinates from the LLM are automatically scaled back to original size.

Structured Output

All providers support structured output through Pydantic models:

from pydantic import BaseModel

class ProductInfo(BaseModel):
    name: str
    price: float
    rating: float
    in_stock: bool

agent = Agent(
    task="Extract product information",
    llm=llm,
    output_model_schema=ProductInfo,
)

history = await agent.run()

# Access structured output
product: ProductInfo = history.structured_output
print(f"{product.name}: ${product.price}")

Fallback LLM

Configure a backup model if the primary fails:

agent = Agent(
    task="Critical task",
    llm=ChatAnthropic(model='claude-sonnet-4-0'),
    fallback_llm=ChatOpenAI(model='gpt-4.1-mini'),
)

When Fallback Activates:

Rate limit errors
API connection failures
Model-specific errors

The agent automatically switches to fallback and continues execution.

Cost Tracking

Track API costs across providers:

agent = Agent(
    task="Your task",
    llm=llm,
    calculate_cost=True,
)

history = await agent.run()

if history.usage:
    print(f"Input tokens: {history.usage.input_tokens}")
    print(f"Output tokens: {history.usage.output_tokens}")
    print(f"Total cost: ${history.usage.total_cost:.4f}")

Model Selection Guide

By Use Case
By Performance
By Cost

Fast & Cheap

ChatBrowserUse (recommended)
ChatOpenAI(‘gpt-4.1-mini’)
ChatGoogle(‘gemini-flash-latest’)

Most Capable

ChatAnthropic(‘claude-opus-4-0’)
ChatOpenAI(‘gpt-4.1’)
ChatBrowserUse()

Visual Tasks

ChatAnthropic(‘claude-sonnet-4-0’) - Auto-optimized
ChatBrowserUse()
ChatOpenAI(‘gpt-4.1’)

Local/Private

ChatOllama(‘llama3.1’)
ChatOllama(‘mistral’)

Advanced Features

Coordinate Clicking

Automatically enabled for models that support it (line 321 in agent/service.py):

# Auto-enabled for:
# - claude-sonnet-4
# - claude-opus-4
# - gemini-3-pro
# - browser-use/* models

agent = Agent(
    task="Click task",
    llm=ChatAnthropic(model='claude-sonnet-4-0'),
)
# Coordinate clicking is now available

With coordinate clicking, the agent can click by pixel coordinates instead of element indices.

Flash Mode

Some models work better with simplified prompts:

# Browser Use models automatically enable flash mode
llm = ChatBrowserUse()

agent = Agent(
    task="Quick task",
    llm=llm,
    # flash_mode=True automatically set
)

Flash Mode Effects:

Disables evaluation and next_goal fields
Disables thinking field
Disables planning
Faster execution

Page Extraction LLM

Use a different model for content extraction:

agent = Agent(
    task="Complex navigation and extraction",
    llm=ChatAnthropic(model='claude-sonnet-4-0'),  # Main agent
    page_extraction_llm=ChatOpenAI(model='gpt-4.1-mini'),  # Faster for extraction
)

Using a smaller model for extraction can significantly reduce costs without affecting accuracy for text extraction tasks.

Judge LLM

Validate task completion with a separate model:

agent = Agent(
    task="Complete checkout process",
    llm=ChatBrowserUse(),
    use_judge=True,
    judge_llm=ChatOpenAI(model='gpt-4.1'),  # More capable judge
    ground_truth="Order confirmation page with order number",
)

Complete Example

from browser_use import Agent, Browser, ChatBrowserUse, ChatOpenAI
from browser_use.agent.views import MessageCompactionSettings
from pydantic import BaseModel
import asyncio

class ProductList(BaseModel):
    """Structured output for products."""
    products: list[dict[str, str | float]]
    total_found: int

async def main():
    # Configure browser
    browser = Browser(
        headless=False,
        window_size={'width': 1920, 'height': 1080},
    )
    
    # Configure agent with multiple models
    agent = Agent(
        task="""
        Go to Amazon and search for "wireless mouse".
        Extract the top 10 products with name, price, and rating.
        """,
        
        # Primary LLM - optimized for browser tasks
        llm=ChatBrowserUse(),
        
        # Extraction LLM - fast model for content extraction
        page_extraction_llm=ChatOpenAI(model='gpt-4.1-mini'),
        
        # Fallback if primary fails
        fallback_llm=ChatOpenAI(model='gpt-4.1-mini'),
        
        # Browser instance
        browser=browser,
        
        # Structured output
        output_model_schema=ProductList,
        
        # Vision settings
        use_vision=True,
        llm_screenshot_size=(1400, 850),
        
        # Performance
        max_actions_per_step=3,
        message_compaction=MessageCompactionSettings(enabled=True),
        
        # Cost tracking
        calculate_cost=True,
    )
    
    # Run agent
    history = await agent.run(max_steps=50)
    
    # Results
    print(f"✓ Completed in {history.number_of_steps()} steps")
    
    if history.structured_output:
        products: ProductList = history.structured_output
        print(f"✓ Found {products.total_found} products")
        for p in products.products[:3]:
            print(f"  - {p['name']}: ${p['price']}")
    
    if history.usage:
        print(f"✓ Cost: ${history.usage.total_cost:.4f}")
        print(f"✓ Tokens: {history.usage.total_tokens:,}")

if __name__ == "__main__":
    asyncio.run(main())

Troubleshooting

Rate limit errors

Solutions:

Use fallback LLM: fallback_llm=ChatOpenAI(...)
Add retry logic with exponential backoff
Switch to ChatBrowserUse for higher limits

Timeout errors

Fix:

Increase timeout: llm_timeout=120
Use faster model: ChatBrowserUse, ChatGroq
Enable flash_mode for simpler prompts

Vision not working

Check:

Model supports vision (not DeepSeek, Grok 3)
use_vision is enabled
Screenshots are being captured: include_screenshot=True

High costs

Optimize:

Use ChatBrowserUse (lowest cost per task)
Use smaller extraction LLM: page_extraction_llm
Enable message compaction
Reduce screenshot size: llm_screenshot_size
Set max_steps limit

Next Steps

Supported Models

Complete list of available models

Agent Configuration

Learn about agent settings

Cost Optimization

Reduce API costs

Structured Output

Extract structured data

Getting Started

Core Concepts

Guides

Advanced

Overview

BaseChatModel Interface

Recommended: ChatBrowserUse

Supported Providers

OpenAI

Anthropic

Google

DeepSeek

Groq

XAI (Grok)

AWS Bedrock

Azure OpenAI

Ollama

Model Configuration

Temperature

Timeout Configuration

Vision Configuration

Screenshot Optimization

Structured Output

Fallback LLM

Cost Tracking

Model Selection Guide

Advanced Features

Coordinate Clicking

Flash Mode

Page Extraction LLM

Judge LLM

Complete Example

Troubleshooting

Next Steps

Supported Models

Agent Configuration

Cost Optimization

Structured Output

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Advanced

​Overview

​BaseChatModel Interface

​Recommended: ChatBrowserUse

​Supported Providers

​OpenAI

​Anthropic

​Google

​DeepSeek

​Groq

​XAI (Grok)

​AWS Bedrock

​Azure OpenAI

​Ollama

​Model Configuration

​Temperature

​Timeout Configuration

​Vision Configuration

​Screenshot Optimization

​Structured Output

​Fallback LLM

​Cost Tracking

​Model Selection Guide

​Advanced Features

​Coordinate Clicking

​Flash Mode

​Page Extraction LLM

​Judge LLM

​Complete Example

​Troubleshooting

​Next Steps

Supported Models

Agent Configuration

Cost Optimization

Structured Output

Build docs developers (and LLMs) love

Overview

BaseChatModel Interface

Recommended: ChatBrowserUse

Supported Providers

OpenAI

Anthropic

Google

DeepSeek

Groq

XAI (Grok)

AWS Bedrock

Azure OpenAI

Ollama

Model Configuration

Temperature

Timeout Configuration

Vision Configuration

Screenshot Optimization

Structured Output

Fallback LLM

Cost Tracking

Model Selection Guide

Advanced Features

Coordinate Clicking

Flash Mode

Page Extraction LLM

Judge LLM

Complete Example

Troubleshooting

Next Steps