DecipherIt leverages the official Bright Data MCP Server - a powerful Model Context Protocol server that provides comprehensive web access capabilities.
The Bright Data MCP Server enables real-time web access, bypasses geo-restrictions, and overcomes bot detection - essential for comprehensive research.
Used by the Link Collector agent to discover relevant sources:
backend/agents/topic_research_agent.py
# Filter tools for link collectionweb_scraping_link_collector_tools = [ tool for tool in tools if tool.name in ["search_engine"]]# Create Link Collector agent with search toolweb_scraping_link_collector = Agent( role=AGENT_CONFIGS["web_scraping_link_collector"]["role"], goal=AGENT_CONFIGS["web_scraping_link_collector"]["goal"], backstory=AGENT_CONFIGS["web_scraping_link_collector"]["backstory"], verbose=True, tools=web_scraping_link_collector_tools, llm=llm,)# Task configuration for link collectionlink_collector_task = Task( description="""Using the search query - \"{search_query}\" provided, collect relevant links using the search_engine tool. Follow these steps precisely: 1. Use the search_engine tool with parameters: - engine: \"google\" - query: the provided search query - \"{search_query}\" 2. From the search results: - Review and analyze each result's relevance - Select 10 of the most relevant and authoritative links - Focus on high-quality sources 3. Format the output as a JSON object with links array """, expected_output="A JSON object containing array of relevant links", agent=web_scraping_link_collector, max_retries=5, output_pydantic=WebScrapingLinkCollectorTaskResult)
Used by the Web Scraper agent to extract clean content:
backend/agents/topic_research_agent.py
# Filter tools for web scrapingweb_scraping_tools = [ tool for tool in tools if tool.name in ["scrape_as_markdown"]]# Create Web Scraper agentweb_scraper = Agent( role=AGENT_CONFIGS["web_scraper"]["role"], goal=AGENT_CONFIGS["web_scraper"]["goal"], backstory=AGENT_CONFIGS["web_scraper"]["backstory"], verbose=True, tools=web_scraping_tools, llm=llm, max_iter=50,)# Task configuration for web scrapingweb_scraping_task = Task( description="""STRICTLY FOLLOW THESE INSTRUCTIONS TO EXTRACT RAW CONTENT: 1. Extract the content: - Use scrape_as_markdown to capture ALL raw text from {url} 2. Return the raw text as a string CRITICAL REQUIREMENTS: - Extract and preserve ALL text exactly as it appears - Do NOT summarize or modify any content - Do NOT skip any text content - Include complete URL and page title - If page fails to load, return error status in output Current time: {current_time}""", expected_output="Complete raw text content from the URL, no modifications", agent=web_scraper, max_retries=5)
The scraper is instructed to preserve ALL content without modification to ensure data integrity.
DecipherIt executes multiple web scraping tasks in parallel for optimal performance:
backend/agents/topic_research_agent.py
import asyncioasync def run_research_crew(topic: str): with MCPServerAdapter(server_params) as tools: # ... (agent setup) # Create parallel tasks for link collection link_collector_tasks = [] for search_query in search_queries: link_collector_tasks.append( web_scraping_link_collector_crew.kickoff_async( inputs={ "topic": topic, "search_query": search_query, "current_time": current_time, } ) ) # Execute all link collection tasks in parallel link_collector_results = await asyncio.gather(*link_collector_tasks) # Process results and collect unique links links = [] for result in link_collector_results: result_links = result["links"] for link in result_links: if link.url not in [l.url for l in links]: links.append(link) logger.info(f"Unique Links Collected: {links}") # Create parallel tasks for web scraping web_scraping_tasks = [] for link in links: web_scraping_tasks.append( web_scraping_crew.kickoff_async( inputs={ "topic": topic, "url": link.url, "current_time": current_time, } ) ) # Execute all web scraping tasks in parallel web_scraping_results = await asyncio.gather(*web_scraping_tasks) # Process results and collect scraped data scraped_data = [] for link, result in zip(links, web_scraping_results): scraped_data.append({ "url": link.url, "page_title": link.title, "content": result.raw }) return scraped_data