Web Automation Agent - Awesome AI Apps

Overview

The Web Automation Agent is a powerful tool that uses the browser-use library to perform tasks in a web browser based on natural language instructions. Powered by large language models from Nebius Token Factory, it can navigate websites, interact with elements, and extract information automatically.

Features

Natural Language Control

Control browser with plain English instructions

Automated Navigation

Navigate websites and interact with elements

Data Extraction

Extract and process information from web pages

AI-Powered

Uses Nebius AI for intelligent task execution

Prerequisites

Python 3.7+

Ensure Python 3.7 or higher is installed

Nebius AI API Key

Get your API key from Nebius Token Factory

Installation

Clone the Repository

git clone https://github.com/Arindam200/awesome-ai-apps
cd simple_ai_agents/browser_agent

Install Dependencies

Using uv:

pip install uv
uv sync

Or using pip:

pip install -r requirements.txt

Configure Environment

Create a .env file:

NEBIUS_API_KEY="your_nebius_api_key_here"

Implementation

The browser automation agent is built using the browser-use library with Nebius AI:

import asyncio
import os
from dotenv import load_dotenv
from browser_use.llm import ChatOpenAI
from browser_use import Agent

# Load environment variables
load_dotenv()

api_key = os.getenv('NEBIUS_API_KEY')

if not api_key:
    raise ValueError('NEBIUS_API_KEY is not set')

async def run_search():
    agent = Agent(
        task=(
            "Go to flipkart.com, search for laptop, "
            "sort by best rating, and give me the price "
            "of the first result in markdown"
        ),
        llm=ChatOpenAI(
            base_url='https://api.tokenfactory.nebius.com/v1',
            model='Qwen/Qwen3-235B-A22B-Instruct-2507',
            api_key=api_key,
        ),
        use_vision=False,
    )
    await agent.run()

if __name__ == '__main__':
    asyncio.run(run_search())

How It Works

The agent operates in several phases:

Task Understanding

The LLM analyzes the natural language instruction and breaks it down into actionable steps

Browser Control

The agent launches a browser instance and navigates to the target website

Element Interaction

Interacts with page elements (search boxes, buttons, dropdowns) to complete the task

Data Extraction

Extracts the requested information from the final page state

Output Formatting

Presents the results in the requested format (e.g., Markdown)

Usage

Run the Agent

uv run main.py

Or with Python:

python main.py

Observe Browser

The agent will launch a browser window and perform the specified task automatically

View Results

Final output will be printed to the console in the requested format

Example Tasks

Here are example tasks you can perform with the browser automation agent:

Agent(
    task="Go to amazon.com, search for 'wireless headphones', "
         "sort by customer reviews, and extract the top 3 product names and prices",
    llm=ChatOpenAI(...),
)

Configuration Options

Customize the agent’s behavior with these options:

Agent Parameters

Parameter	Description	Default
`task`	Natural language task description	Required
`llm`	Language model configuration	Required
`use_vision`	Enable visual page understanding	`False`
`timeout`	Maximum execution time (seconds)	`60`
`headless`	Run browser in headless mode	`False`

LLM Configuration

llm=ChatOpenAI(
    base_url='https://api.tokenfactory.nebius.com/v1',
    model='Qwen/Qwen3-235B-A22B-Instruct-2507',  # Model selection
    api_key=api_key,
    temperature=0.1,  # Lower = more deterministic
)

The agent uses an OpenAI-compatible API interface with Nebius Token Factory for model access.

Available Models

Nebius Token Factory provides access to various models:

Qwen3-235B

Large ModelBest for complex reasoning and multi-step tasksModel: Qwen/Qwen3-235B-A22B-Instruct-2507

Qwen3-30B

Fast ModelFaster execution for simpler tasksModel: Qwen/Qwen3-30B-A3B

Advanced Features

Vision-Enabled Tasks

Enable visual understanding for complex page layouts:

agent = Agent(
    task="Find and click the blue 'Login' button on the page",
    llm=ChatOpenAI(...),
    use_vision=True,  # Enable visual page understanding
)

Vision mode requires additional processing time and may increase API costs.

Headless Mode

Run browser without GUI for production environments:

agent = Agent(
    task="...",
    llm=ChatOpenAI(...),
    headless=True,  # No visible browser window
)

Custom Timeouts

Set maximum execution time for tasks:

agent = Agent(
    task="...",
    llm=ChatOpenAI(...),
    timeout=120,  # 2 minutes
)

Error Handling

The agent includes built-in error handling:

API Key Missing

Error: NEBIUS_API_KEY is not setSolution: Ensure your .env file contains a valid API key:

NEBIUS_API_KEY=your_actual_key_here

Browser Launch Failed

Possible Causes:

Missing browser dependencies
Port conflicts

Solution: Ensure Chrome/Chromium is installed and ports are available

Task Timeout

Cause: Task exceeded maximum execution timeSolution: Increase timeout or simplify the task

Element Not Found

Cause: Page structure changed or element doesn’t existSolution: Verify the website structure and update task description

Best Practices

Clear Instructions

Provide detailed, step-by-step task descriptions for better results

Verify Selectors

Check that page elements are stable and accessible

Handle Waits

Account for page load times in your task description

Error Recovery

Include retry logic for unreliable operations

Use Cases

E-commerce
Data Collection
Testing
Automation

Product price monitoring
Inventory checking
Competitive analysis
Automated shopping

Limitations

Important Considerations:

Respect website terms of service and robots.txt
Be mindful of rate limits and server load
Some websites may block automated access
Dynamic content may require vision mode
Complex CAPTCHAs cannot be automatically solved

Troubleshooting

Verify API Key

Ensure your Nebius API key is valid and has sufficient credits

Check Dependencies

Verify all required packages are installed:

uv sync

Test Browser

Ensure browser launches correctly in non-headless mode first

Review Task

Simplify complex tasks into smaller, testable steps

Security Considerations

Security Best Practices:

Never hardcode API keys in source code
Use environment variables for sensitive data
Avoid entering real credentials in automated forms
Review and sanitize extracted data
Monitor API usage and costs

Next Steps

browser-use Docs

Explore advanced browser-use features and capabilities

Nebius Models

Try different models for various use cases

Vision Mode

Experiment with visual page understanding

Production Deploy

Scale your automation with headless mode

Starter Agents

Simple Agents

MCP Agents

Memory Agents

RAG Applications

Advanced Agents

​Overview

​Features

Natural Language Control

Automated Navigation

Data Extraction

AI-Powered

​Prerequisites

​Installation

​Implementation

​How It Works

​Usage

​Example Tasks

​Configuration Options

​Agent Parameters

​LLM Configuration

​Available Models

Qwen3-235B

Qwen3-30B

​Advanced Features

​Vision-Enabled Tasks

​Headless Mode

​Custom Timeouts

​Error Handling

​Best Practices

Clear Instructions

Verify Selectors

Handle Waits

Error Recovery

​Use Cases

​Limitations

​Troubleshooting

​Security Considerations

​Next Steps

browser-use Docs

Nebius Models

Vision Mode

Production Deploy

Build docs developers (and LLMs) love

Overview

Features

Prerequisites

Installation

Implementation

How It Works

Usage

Example Tasks

Configuration Options

Agent Parameters

LLM Configuration

Available Models

Advanced Features

Vision-Enabled Tasks

Headless Mode

Custom Timeouts

Error Handling

Best Practices

Use Cases

Limitations

Troubleshooting

Security Considerations

Next Steps