Skip to main content

Overview

The Web Automation Agent is a powerful tool that uses the browser-use library to perform tasks in a web browser based on natural language instructions. Powered by large language models from Nebius Token Factory, it can navigate websites, interact with elements, and extract information automatically.

Features

Natural Language Control

Control browser with plain English instructions

Automated Navigation

Navigate websites and interact with elements

Data Extraction

Extract and process information from web pages

AI-Powered

Uses Nebius AI for intelligent task execution

Prerequisites

1

Python 3.7+

Ensure Python 3.7 or higher is installed
2

Nebius AI API Key

Get your API key from Nebius Token Factory

Installation

1

Clone the Repository

git clone https://github.com/Arindam200/awesome-ai-apps
cd simple_ai_agents/browser_agent
2

Install Dependencies

Using uv:
pip install uv
uv sync
Or using pip:
pip install -r requirements.txt
3

Configure Environment

Create a .env file:
NEBIUS_API_KEY="your_nebius_api_key_here"

Implementation

The browser automation agent is built using the browser-use library with Nebius AI:
import asyncio
import os
from dotenv import load_dotenv
from browser_use.llm import ChatOpenAI
from browser_use import Agent

# Load environment variables
load_dotenv()

api_key = os.getenv('NEBIUS_API_KEY')

if not api_key:
    raise ValueError('NEBIUS_API_KEY is not set')

async def run_search():
    agent = Agent(
        task=(
            "Go to flipkart.com, search for laptop, "
            "sort by best rating, and give me the price "
            "of the first result in markdown"
        ),
        llm=ChatOpenAI(
            base_url='https://api.tokenfactory.nebius.com/v1',
            model='Qwen/Qwen3-235B-A22B-Instruct-2507',
            api_key=api_key,
        ),
        use_vision=False,
    )
    await agent.run()

if __name__ == '__main__':
    asyncio.run(run_search())

How It Works

The agent operates in several phases:
1

Task Understanding

The LLM analyzes the natural language instruction and breaks it down into actionable steps
2

Browser Control

The agent launches a browser instance and navigates to the target website
3

Element Interaction

Interacts with page elements (search boxes, buttons, dropdowns) to complete the task
4

Data Extraction

Extracts the requested information from the final page state
5

Output Formatting

Presents the results in the requested format (e.g., Markdown)

Usage

1

Run the Agent

uv run main.py
Or with Python:
python main.py
2

Observe Browser

The agent will launch a browser window and perform the specified task automatically
3

View Results

Final output will be printed to the console in the requested format

Example Tasks

Here are example tasks you can perform with the browser automation agent:
Agent(
    task="Go to amazon.com, search for 'wireless headphones', "
         "sort by customer reviews, and extract the top 3 product names and prices",
    llm=ChatOpenAI(...),
)

Configuration Options

Customize the agent’s behavior with these options:

Agent Parameters

ParameterDescriptionDefault
taskNatural language task descriptionRequired
llmLanguage model configurationRequired
use_visionEnable visual page understandingFalse
timeoutMaximum execution time (seconds)60
headlessRun browser in headless modeFalse

LLM Configuration

llm=ChatOpenAI(
    base_url='https://api.tokenfactory.nebius.com/v1',
    model='Qwen/Qwen3-235B-A22B-Instruct-2507',  # Model selection
    api_key=api_key,
    temperature=0.1,  # Lower = more deterministic
)
The agent uses an OpenAI-compatible API interface with Nebius Token Factory for model access.

Available Models

Nebius Token Factory provides access to various models:

Qwen3-235B

Large ModelBest for complex reasoning and multi-step tasksModel: Qwen/Qwen3-235B-A22B-Instruct-2507

Qwen3-30B

Fast ModelFaster execution for simpler tasksModel: Qwen/Qwen3-30B-A3B

Advanced Features

Vision-Enabled Tasks

Enable visual understanding for complex page layouts:
agent = Agent(
    task="Find and click the blue 'Login' button on the page",
    llm=ChatOpenAI(...),
    use_vision=True,  # Enable visual page understanding
)
Vision mode requires additional processing time and may increase API costs.

Headless Mode

Run browser without GUI for production environments:
agent = Agent(
    task="...",
    llm=ChatOpenAI(...),
    headless=True,  # No visible browser window
)

Custom Timeouts

Set maximum execution time for tasks:
agent = Agent(
    task="...",
    llm=ChatOpenAI(...),
    timeout=120,  # 2 minutes
)

Error Handling

The agent includes built-in error handling:
Error: NEBIUS_API_KEY is not setSolution: Ensure your .env file contains a valid API key:
NEBIUS_API_KEY=your_actual_key_here
Possible Causes:
  • Missing browser dependencies
  • Port conflicts
Solution: Ensure Chrome/Chromium is installed and ports are available
Cause: Task exceeded maximum execution timeSolution: Increase timeout or simplify the task
Cause: Page structure changed or element doesn’t existSolution: Verify the website structure and update task description

Best Practices

Clear Instructions

Provide detailed, step-by-step task descriptions for better results

Verify Selectors

Check that page elements are stable and accessible

Handle Waits

Account for page load times in your task description

Error Recovery

Include retry logic for unreliable operations

Use Cases

  • Product price monitoring
  • Inventory checking
  • Competitive analysis
  • Automated shopping

Limitations

Important Considerations:
  • Respect website terms of service and robots.txt
  • Be mindful of rate limits and server load
  • Some websites may block automated access
  • Dynamic content may require vision mode
  • Complex CAPTCHAs cannot be automatically solved

Troubleshooting

1

Verify API Key

Ensure your Nebius API key is valid and has sufficient credits
2

Check Dependencies

Verify all required packages are installed:
uv sync
3

Test Browser

Ensure browser launches correctly in non-headless mode first
4

Review Task

Simplify complex tasks into smaller, testable steps

Security Considerations

Security Best Practices:
  • Never hardcode API keys in source code
  • Use environment variables for sensitive data
  • Avoid entering real credentials in automated forms
  • Review and sanitize extracted data
  • Monitor API usage and costs

Next Steps

browser-use Docs

Explore advanced browser-use features and capabilities

Nebius Models

Try different models for various use cases

Vision Mode

Experiment with visual page understanding

Production Deploy

Scale your automation with headless mode

Build docs developers (and LLMs) love