Bright Data MCP Server

DecipherIt uses the official Bright Data MCP Server for advanced web scraping capabilities that bypass geo-restrictions and bot detection.

Overview

The Bright Data MCP Server provides:

Real-time Web Access - Access up-to-date information directly from the web
Bypass Geo-restrictions - Access content regardless of location constraints
Web Unlocker Technology - Navigate websites with advanced bot detection protection
Browser Control - Optional remote browser automation capabilities
Seamless Integration - Works with all MCP-compatible AI assistants

Prerequisites

Node.js 20+ and pnpm installed
Bright Data account (sign up - new users get free credits)

Environment Variables

Backend Configuration

Add these environment variables to your backend .env file:

backend/.env

BRIGHT_DATA_API_TOKEN=your_bright_data_api_token
BRIGHT_DATA_BROWSER_AUTH=your_bright_data_browser_auth

Get Your API Token

Navigate to your user settings page

Copy your API token

Add it to your .env file as BRIGHT_DATA_API_TOKEN

Configure Browser Auth

In the Bright Data control panel, navigate to Web Unlocker settings

Get your browser authentication credentials

Add them to your .env file as BRIGHT_DATA_BROWSER_AUTH

Web Unlocker Zone (Optional)

By default, DecipherIt creates a Web Unlocker zone automatically using your API token. For advanced use cases:

Create a custom Web Unlocker zone in your Bright Data control panel

This provides more control over proxy settings and usage limits

Integration Implementation

The Bright Data MCP Server is integrated using CrewAI’s MCPServerAdapter:

backend/agents/topic_research_agent.py

from mcp import StdioServerParameters
from crewai_tools import MCPServerAdapter
import os

server_params = StdioServerParameters(
    command="pnpm",
    args=["dlx", "@brightdata/mcp"],
    env={
        "API_TOKEN": os.environ["BRIGHT_DATA_API_TOKEN"],
        "BROWSER_AUTH": os.environ["BRIGHT_DATA_BROWSER_AUTH"]
    },
)

# Initialize within agent workflow
async def run_research_crew(topic: str):
    with MCPServerAdapter(server_params) as tools:
        # Tools are now available for agents
        web_scraping_tools = [tool for tool in tools if tool.name in ["scrape_as_markdown"]]
        search_tools = [tool for tool in tools if tool.name in ["search_engine"]]

Available Tools

DecipherIt leverages two key tools from the Bright Data MCP server:

search_engine

Search the web for relevant information and discover sources:

backend/config/topic_research/tasks.py

# Used by Link Collector Agent
link_collector_task = Task(
    description="""Using the search query provided, collect relevant links using the search engine tool.
    
    Follow these steps:
    1. Use the search_engine tool with parameters:
       - engine: "google"
       - query: the provided search query
    2. Select 10 of the most relevant and authoritative links
    3. Focus on high-quality sources
    """,
    agent=web_scraping_link_collector,
    tools=web_scraping_link_collector_tools
)

scrape_as_markdown

Extract and convert web content to clean, structured Markdown format:

backend/config/topic_research/tasks.py

# Used by Web Scraper Agent
web_scraping_task = Task(
    description="""Extract raw content from the URL:
    
    1. Use scrape_as_markdown to capture ALL raw text
    2. Return the raw text as a string
    3. Preserve ALL text exactly as it appears
    """,
    agent=web_scraper,
    tools=web_scraping_tools
)

Multi-Agent Workflow

DecipherIt uses parallel execution for efficient scraping:

backend/agents/topic_research_agent.py

import asyncio

# Execute multiple scraping tasks in parallel
web_scraping_tasks = []
for link in links:
    web_scraping_tasks.append(
        web_scraping_crew.kickoff_async(inputs={
            "url": link.url,
            "current_time": current_time,
        })
    )

# Gather results concurrently
web_scraping_results = await asyncio.gather(*web_scraping_tasks)

# Process scraped data
for link, result in zip(links, web_scraping_results):
    scraped_data.append({
        "url": link.url,
        "page_title": link.title,
        "content": result.raw
    })

AI Agents Using Bright Data

Several specialized agents use Bright Data tools:

Web Scraping Planner

Role: Web Scraping Strategy ExpertGoal: Design optimal web scraping plans with targeted search queries to comprehensively gather relevant information.Capabilities: Creates strategic search patterns that ensure comprehensive coverage while avoiding redundancy.

Link Collector Agent

Role: Link Discovery SpecialistTools: search_engineGoal: Discover and curate the most comprehensive and relevant collection of web sources.Capabilities: Uses Bright Data’s search engine to find authoritative sources globally, bypassing geo-restrictions.

Web Scraper Agent

Role: Web Scraping EngineerTools: scrape_as_markdownGoal: Navigate complex websites and extract targeted information while maintaining data integrity.Capabilities: Uses Bright Data’s Web Unlocker to extract clean, structured content from discovered URLs.

Security Best Practices

Important: Always treat scraped web content as untrusted data.

DecipherIt automatically implements security measures:

Data Validation - Filters and validates all web data before processing
Structured Extraction - Uses structured data extraction rather than raw text
Rate Limiting - Implements rate limiting and error handling
Error Recovery - Gracefully handles scraping failures with retries

backend/agents/topic_research_agent.py

# Automatic retry configuration
web_scraping_task = Task(
    description=TOPIC_RESEARCH_TASK_CONFIGS["web_scraping"]["description"],
    expected_output=TOPIC_RESEARCH_TASK_CONFIGS["web_scraping"]["expected_output"],
    agent=web_scraper,
    max_retries=5  # Automatic retry on failure
)

Monitoring and Logging

DecipherIt logs all scraping operations for debugging:

backend/agents/topic_research_agent.py

from loguru import logger

logger.info(f"Running web scraping crew for {len(links)} links")

# Crew logs to file
web_scraping_crew = Crew(
    agents=[web_scraper],
    tasks=[web_scraping_task],
    verbose=True,
    output_log_file=f"logs/web_scraping_crew_{current_time}.log"
)

Troubleshooting

Connection Issues

If you encounter connection errors:

Verify your BRIGHT_DATA_API_TOKEN is correct
Check that BRIGHT_DATA_BROWSER_AUTH is properly configured
Ensure pnpm is installed and accessible: pnpm --version

Rate Limiting

The system includes built-in rate limiting:

backend/agents/topic_research_agent.py

web_scraping_crew = Crew(
    agents=[web_scraper],
    tasks=[web_scraping_task],
    max_rpm=20  # Maximum requests per minute
)

Scraping Failures

If specific URLs fail to scrape:

Check the logs in logs/web_scraping_crew_*.log
Verify the URL is accessible
The system will retry up to 5 times automatically

Get Started

Core Features

Architecture

Self-Hosting

Integrations

Bright Data MCP Server

Overview

Prerequisites

Environment Variables

Backend Configuration

Integration Implementation

Available Tools

search_engine

scrape_as_markdown

Multi-Agent Workflow

AI Agents Using Bright Data

Web Scraping Planner

Link Collector Agent

Web Scraper Agent

Security Best Practices

Monitoring and Logging

Troubleshooting

Connection Issues

Rate Limiting

Scraping Failures

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Self-Hosting

Integrations

​Overview

​Prerequisites

​Environment Variables

​Backend Configuration

​Integration Implementation

​Available Tools

​search_engine

​scrape_as_markdown

​Multi-Agent Workflow

​AI Agents Using Bright Data

​Web Scraping Planner

​Link Collector Agent

​Web Scraper Agent

​Security Best Practices

​Monitoring and Logging

​Troubleshooting

​Connection Issues

​Rate Limiting

​Scraping Failures

​Next Steps

Build docs developers (and LLMs) love

Overview

Prerequisites

Environment Variables

Backend Configuration

Integration Implementation

Available Tools

search_engine

scrape_as_markdown

Multi-Agent Workflow

AI Agents Using Bright Data

Web Scraping Planner

Link Collector Agent

Web Scraper Agent

Security Best Practices

Monitoring and Logging

Troubleshooting

Connection Issues

Rate Limiting

Scraping Failures

Next Steps