AI Agents System

Multi-Agent Architecture

DecipherIt employs a sophisticated multi-crew architecture powered by CrewAI, where specialized AI agents work together to accomplish complex research tasks.

Each crew is designed for a specific research workflow, with agents collaborating to produce high-quality outputs.

Agent Crews Overview

DecipherIt implements multiple specialized crews:

Planning Crew

Strategizes optimal web scraping approaches and search queries

Link Collection Crew

Discovers relevant sources using Bright Data search

Web Scraping Crew

Extracts clean content from discovered URLs

Research & Content Crew

Analyzes sources and creates comprehensive summaries

Chat Crew

Handles interactive Q&A with context awareness

Mindmap Crew

Creates hierarchical visual representations

Topic Research Agent Configuration

Agent Definitions

Agents are configured with specific roles, goals, and backstories to optimize their performance:

backend/config/topic_research/agents.py

AGENT_CONFIGS = {
    "web_scraping_planner": {
        "role": "{topic} Web Scraping Strategy Expert",
        "goal": "Design an optimal web scraping plan with targeted search queries to comprehensively gather all relevant information about {topic}",
        "backstory": """You are a distinguished web scraping strategist with extensive experience in planning large-scale web data collection projects. Your expertise lies in breaking down complex topics into precise, targeted search queries that ensure comprehensive coverage."""
    },
    "web_scraping_link_collector": {
        "role": "{topic} Link Discovery Specialist",
        "goal": "Discover and curate the most comprehensive and relevant collection of web sources about {topic}",
        "backstory": """You are an elite web research specialist with unparalleled expertise in discovering high-quality information sources. Your background includes years of experience in advanced search techniques and source evaluation."""
    },
    "web_scraper": {
        "role": "{topic} Expert Web Scraping Engineer",
        "goal": "Navigate through complex websites, extract targeted information about {topic}, and compile comprehensive datasets",
        "backstory": """You are an elite web scraping engineer with unparalleled expertise in automated data extraction and web navigation."""
    },
    "researcher": {
        "role": "{topic} Senior Research Analyst & Knowledge Synthesizer",
        "goal": "Conduct exhaustive analysis of multi-source data about {topic}, uncovering hidden patterns and producing comprehensive research insights",
        "backstory": """You are an elite research analyst with decades of experience in knowledge synthesis and pattern recognition across complex datasets."""
    },
    "content_writer": {
        "role": "{topic} Senior Content Strategist & Research Synthesizer",
        "goal": "Transform extensive research findings about {topic} into meticulously structured, deeply informative content",
        "backstory": """You are an elite content strategist with extensive experience in research synthesis and long-form content creation."""
    }
}

Task Configuration

Tasks define what each agent should accomplish:

backend/config/topic_research/tasks.py

TASK_CONFIGS = {
    "planner": {
        "description": """Generate 3 unique search queries for the topic \"{topic}\".
        
        Your task:
        1. Create 3 different search queries to research this topic
        2. Keep the queries simple and clear
        3. Format output as a JSON object
        
        Output format required:
        {
            "search_queries": [
                "query1",
                "query2",
                "query3"
            ]
        }""",
        "expected_output": "A JSON object containing 3 unique search queries"
    },
    "link_collector": {
        "description": """Using the search query provided, collect relevant links using the search_engine tool.
        
        Follow these steps:
        1. Use the search_engine tool with engine: \"google\" and the provided query
        2. Select 10 of the most relevant and authoritative links
        3. Format as JSON: {\"links\": [{\"url\": ..., \"title\": ...}]}""",
        "expected_output": "A JSON object with array of relevant, high-quality links"
    }
}

Topic Research Workflow

Here’s the complete implementation of the topic research agent:

backend/agents/topic_research_agent.py

from crewai import Agent, Crew, Task, Process
from mcp import StdioServerParameters
from crewai_tools import MCPServerAdapter
import asyncio

server_params = StdioServerParameters(
    command="pnpm",
    args=["dlx", "@brightdata/mcp"],
    env={
        "API_TOKEN": os.environ["BRIGHT_DATA_API_TOKEN"],
        "BROWSER_AUTH": os.environ["BRIGHT_DATA_BROWSER_AUTH"]
    },
)

async def run_research_crew(topic: str):
    logger.info(f"Running topic research crew for topic: {topic}")
    
    with MCPServerAdapter(server_params) as tools:
        # Filter tools for specific agents
        web_scraping_link_collector_tools = [
            tool for tool in tools if tool.name in ["search_engine"]
        ]
        
        web_scraping_tools = [
            tool for tool in tools if tool.name in ["scrape_as_markdown"]
        ]
        
        # Create planning crew
        web_scraping_planner = Agent(
            role=AGENT_CONFIGS["web_scraping_planner"]["role"],
            goal=AGENT_CONFIGS["web_scraping_planner"]["goal"],
            backstory=AGENT_CONFIGS["web_scraping_planner"]["backstory"],
            verbose=True,
            llm=llm,
        )
        
        planner_task = Task(
            description=TASK_CONFIGS["planner"]["description"],
            expected_output=TASK_CONFIGS["planner"]["expected_output"],
            agent=web_scraping_planner,
            max_retries=5,
            output_pydantic=WebScrapingPlannerTaskResult
        )
        
        planning_crew = Crew(
            agents=[web_scraping_planner],
            tasks=[planner_task],
            verbose=True,
            process=Process.sequential,
            max_rpm=20
        )
        
        # Execute planning crew
        planning_crew_result = await planning_crew.kickoff_async(
            inputs={"topic": topic, "current_time": current_time}
        )
        
        search_queries = planning_crew_result["search_queries"]
        
        # Create parallel tasks for link collection
        link_collector_tasks = []
        for search_query in search_queries:
            link_collector_tasks.append(
                web_scraping_link_collector_crew.kickoff_async(
                    inputs={"topic": topic, "search_query": search_query}
                )
            )
        
        # Execute all tasks in parallel
        link_collector_results = await asyncio.gather(*link_collector_tasks)
        
        # Process and deduplicate links
        links = []
        for result in link_collector_results:
            result_links = result["links"]
            for link in result_links:
                if link.url not in [l.url for l in links]:
                    links.append(link)
        
        # Parallel web scraping
        web_scraping_tasks = [
            web_scraping_crew.kickoff_async(
                inputs={"topic": topic, "url": link.url}
            )
            for link in links
        ]
        
        web_scraping_results = await asyncio.gather(*web_scraping_tasks)
        
        # Collect scraped data
        scraped_data = [
            {"url": link.url, "page_title": link.title, "content": result.raw}
            for link, result in zip(links, web_scraping_results)
        ]
        
        # Final research and content creation
        research_content_crew_result = await research_content_crew.kickoff_async(
            inputs={"topic": topic, "scraped_data": scraped_data}
        )
        
        return {
            "blog_post": research_content_crew_result["blog_post"],
            "title": research_content_crew_result["title"],
            "links": [link.model_dump() for link in links],
            "scraped_data": scraped_data,
            "faq": [faq.model_dump() for faq in faq_result]
        }

The workflow uses asyncio.gather() to execute multiple web scraping tasks in parallel for optimal performance.

Chat Agent

The chat agent provides interactive Q&A capabilities:

backend/agents/chat_agent.py

from crewai import Agent, Crew, Task, Process
from services.qdrant_service import qdrant_service

async def get_relevant_sources(notebook_id: str, query: str):
    results = await qdrant_service.search(query, notebook_id)
    
    output = ""
    for result in results:
        source_info = "Source: Provided Text"
        if result.get('url'):
            page_title = result.get('page_title', '')
            source_info = f"Source: {page_title} ({result['url']})"
        output += f"Content: {result['content_chunk']}\n{source_info}\n---\n"
    
    return output

def get_decipher_crew():
    decipher_agent = Agent(
        role="Decipher",
        goal="Analyze and decode questions to provide precise answers",
        backstory="""You're Decipher, an analytical assistant specialized in 
        breaking down complex queries and providing clear, accurate responses.""",
        verbose=True,
        llm=llm,
    )
    
    answer_question_task = Task(
        description="""Answer based on relevant sources and chat context.
        
        Chat History:


User Question:


Relevant Sources:

""",
expected_output="""A markdown-formatted response with answer and sources.""",
agent=decipher_agent,
)

return Crew(
agents=[decipher_agent],
tasks=[answer_question_task],
process=Process.sequential,
verbose=True
)

async def run_chat_agent(notebook_id: str, messages: List[ChatMessage]):
# Build chat history
chat_history = "\n".join(
f"{message.role}: {message.content}" 
for message in messages[-10:-1]
)

# Rewrite question for better vector search
search_query = messages[-1].content
if chat_history:
search_query = llm.call(f"""Rewrite this question with context:

Chat History: {chat_history}
Current Question: {messages[-1].content}

Output only the rewritten question.""")

# Get relevant sources from vector database
relevant_sources = await get_relevant_sources(notebook_id, search_query)

# Run crew
crew = get_decipher_crew()
crew_result = await crew.kickoff_async(
inputs={
    "question": messages[-1].content,
    "chat_history": chat_history,
    "relevant_sources": relevant_sources,
    "topic": notebook["title"]
}
)

return crew_result.raw

The chat agent rewrites questions using chat history context to improve vector search accuracy.

Mindmap Agent

The mindmap agent creates hierarchical visualizations:

backend/agents/mindmap_agent.py

def get_mindmap_crew():
    content_analyzer = Agent(
        role="Research Content Analyst",
        goal="Analyze research content to identify main themes and hierarchical relationships up to 5 levels deep",
        backstory="""You are an expert content analyst who excels at breaking down 
        complex research into logical hierarchical structures.""",
        llm=llm,
        verbose=True
    )
    
    mindmap_creator = Agent(
        role="Mindmap Specialist",
        goal="Create the final mindmap structure as a nested dictionary",
        backstory="""You are a mindmap specialist who transforms analyzed content 
        into well-organized nested dictionary formats.""",
        llm=llm,
        verbose=True
    )
    
    analyze_content_task = Task(
        description="""Analyze research content and identify hierarchical themes:
        - Level 1: ONE main central topic
        - Level 2: Primary categories (3-6 major categories)
        - Level 3: Secondary subtopics (2-5 per category)
        - Level 4: Detailed aspects (if content warrants it)
        - Level 5: Specific details (only if highly detailed)
        
        Research Content:

""",
expected_output="A hierarchical breakdown with appropriate depth (2-5 levels)",
agent=content_analyzer
)

create_mindmap_task = Task(
description="""Create mindmap structure with hierarchical nodes.

Structure must have:
- Root node: id="root", text, display={"block": true}, nodes array
- Child nodes: unique id, text, nodes array
- Empty nodes arrays [] for leaf nodes
""",
expected_output="A hierarchical node structure representing the mindmap",
output_pydantic=SimpleMindmapStructure,
agent=mindmap_creator,
context=[analyze_content_task]
)

return Crew(
agents=[content_analyzer, mindmap_creator],
tasks=[analyze_content_task, create_mindmap_task],
process=Process.sequential,
verbose=True,
)

Agent Best Practices

Configuration
Performance
Quality

Use descriptive roles that guide agent behavior
Set specific, measurable goals
Provide detailed backstories for context
Configure appropriate max_retries for reliability

Use Process.sequential for dependent tasks
Leverage asyncio.gather() for parallel execution
Set max_rpm to control API rate limits
Enable verbose logging for debugging

Next Steps

Web Scraping

Learn about Bright Data MCP Server integration

Vector Search

Understand Qdrant embeddings and search

Get Started

Core Features

Architecture

Self-Hosting

Integrations

Multi-Agent Architecture

Agent Crews Overview

Planning Crew

Link Collection Crew

Web Scraping Crew

Research & Content Crew

Chat Crew

Mindmap Crew

Topic Research Agent Configuration

Agent Definitions

Task Configuration

Topic Research Workflow

Chat Agent

Mindmap Agent

Agent Best Practices

Next Steps

Web Scraping

Vector Search

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Self-Hosting

Integrations

​Multi-Agent Architecture

​Agent Crews Overview

Planning Crew

Link Collection Crew

Web Scraping Crew

Research & Content Crew

Chat Crew

Mindmap Crew

​Topic Research Agent Configuration

​Agent Definitions

​Task Configuration

​Topic Research Workflow

​Chat Agent

​Mindmap Agent

​Agent Best Practices

​Next Steps

Web Scraping

Vector Search

Build docs developers (and LLMs) love

Multi-Agent Architecture

Agent Crews Overview

Topic Research Agent Configuration

Agent Definitions

Task Configuration

Topic Research Workflow

Chat Agent

Mindmap Agent

Agent Best Practices

Next Steps