Multi-Source Research

Overview

Multi-source research lets you conduct deep analysis using your own content sources. Upload documents, paste URLs, or add text directly to create a comprehensive research notebook from materials you provide.

Unlike topic-based research that searches the web, multi-source research analyzes only the content you provide, making it perfect for document analysis, literature reviews, and proprietary content research.

Supported Source Types

DecipherIt accepts three types of sources:

URLs

Web pages, articles, blog posts, documentation sites, and any publicly accessible URL.

Documents

Upload PDFs, Word documents, PowerPoint presentations, Excel files, images, and more.

Text Content

Paste any text content directly - notes, quotes, research findings, or custom content.

How It Works

Source Collection

Add up to 20 sources of any combination of URLs, documents, or text content.Source Processing:

URLs are scraped using Bright Data’s scrape_as_markdown tool
Documents are converted to markdown using MarkItDown service
Text content is used directly as provided

Content Extraction

The Web Scraper agent extracts content from URLs in parallel, while MarkItDown converts uploaded files to text.Parallel Processing:

All URLs scraped simultaneously
Documents converted asynchronously
Text content prepared for analysis

Research Analysis

The Researcher agent synthesizes all sources (scraped URLs, converted files, and text) into a comprehensive analysis.Cross-Source Analysis:

Identifies themes across all source types
Cross-references insights between sources
Integrates file content seamlessly
Notes patterns and relationships

Content Generation

The Content Writer creates an engaging summary with citations from all source types.Citation Types:

URL citations: [Source Title](url)
File citations: References to file names
Text citations: Attributed to “Provided Text”

Adding Sources

URLs
Documents
Text Content

Open New Notebook dialog
Select Sources tab
Choose URL from the source type dropdown
Paste the URL (e.g., https://example.com/article)
Click Add

URLs must be publicly accessible. Authentication-protected or paywalled content may not be fully extracted.

Supported URL Types:

Articles and blog posts
Documentation sites
Research papers (HTML format)
News articles
GitHub repositories (rendered pages)

Source Management

Adding Sources

The Create Notebook dialog shows all added sources:

Source counter: Shows X/20 sources added
Visual indicators: Color-coded badges (URL=blue, Text=green, File=purple)
Preview: Truncated display for long URLs or text
Remove button: Click X to remove any source

Source Limits

You can add up to 20 sources per notebook in any combination of URLs, documents, and text.

Implementation:

Location: client/components/notebook/create-notebook-dialog.tsx:58-131
Sources stored with metadata (type, filename, file path)
Validation ensures source limit compliance

Technical Implementation

Source Processing Architecture

// Source types
type Source = {
  type: "URL" | "TEXT" | "FILE";
  value: string;           // URL or text content
  filename?: string;       // For file uploads
  filePath?: string;       // R2 storage path
}

// Backend processing
const sourcesToSave = sources.map(source => {
  if (source.type === "URL") {
    return {
      sourceType: "URL",
      sourceUrl: source.value,
    };
  } else if (source.type === "FILE") {
    return {
      sourceType: "UPLOAD",
      sourceUrl: source.value,
      filePath: source.filePath,
      filename: source.filename,
    };
  } else {
    return {
      sourceType: "MANUAL",
      content: source.value,
    };
  }
});

Source: client/components/notebook/create-notebook-dialog.tsx:165-184

Multi-Source Research Agent

The sources research crew processes all source types:

async def run_sources_research_crew(sources: List[ResearchSource]):
    # Extract URLs from sources
    links = [WebLink(url=source.source_url, title=source.source_url)
             for source in sources
             if source.source_type == "URL"]
    
    # Scrape all URLs in parallel
    web_scraping_tasks = [
        web_scraping_crew.kickoff_async(inputs={
            "url": link.url,
            "current_time": current_time,
        })
        for link in links
    ]
    web_scraping_results = await asyncio.gather(*web_scraping_tasks)
    
    # Get textual content from MANUAL sources
    textual_content = ""
    for source in sources:
        if source.source_type == "MANUAL":
            textual_content += f"\n---\n- {source.source_content}\n---\n"
    
    # Convert uploaded files to markdown
    if any(source.source_type == "UPLOAD" for source in sources):
        markdown_files = await markdown_converter.convert_urls_to_markdown(
            [source.source_url for source in sources if source.source_type == "UPLOAD"]
        )

Source: backend/agents/sources_research_agent.py:21-196

MarkItDown Integration

Documents are converted to markdown for analysis:

# Convert uploaded files to markdown
markdown_files = await markdown_converter.convert_urls_to_markdown(
    [source.source_url for source in sources if source.source_type == "UPLOAD"]
)
for url, markdown_content in markdown_files.items():
    file_content += f"\n---\n- File: {url}\n---\n{markdown_content}\n---\n"
    file_data.append({
        "file_name": url,
        "content": markdown_content
    })

Source: backend/agents/sources_research_agent.py:158-167

Research Output

Multi-source research generates:

Integrated Summary

Comprehensive analysis synthesizing insights from all source types with proper attribution.

Source References

Complete list of all sources (URLs, file names, and text snippets) used in the analysis.

Cross-Source FAQs

10 questions answered using information from across all your sources.

Vector Database

All content chunked and embedded for semantic search in the Chat feature.

Use Cases

Document Analysis

Upload multiple documents (PDFs, Word docs) to:

Compare findings across papers
Synthesize research literature
Extract key themes from reports
Summarize meeting notes

Web Research

Combine URLs from multiple sources to:

Compare different perspectives
Analyze news coverage
Research competitors
Gather documentation

Mixed Content Analysis

Combine all source types to:

Add context to documents with text notes
Supplement URLs with your observations
Create comprehensive research from diverse sources
Build knowledge bases from multiple formats

Best Practices

Source Quality

Use authoritative, credible sources
Ensure documents are text-based (not scanned images)
Verify URLs are accessible
Provide context with text sources

Source Diversity

Mix source types for richer analysis
Include primary and secondary sources
Add your notes as text sources
Use file uploads for proprietary content

Limitations

Maximum 20 sources per notebook
URL extraction limited to publicly accessible content
File conversion quality depends on document format
Processing time increases with source count (typical: 2-5 minutes)
Scanned PDFs may require OCR (quality varies)

Deep Research

Automated web research on any topic

Interactive Q&A

Ask questions about your sources

Get Started

Core Features

Architecture

Self-Hosting

Integrations

Multi-Source Research

Overview

Supported Source Types

URLs

Documents

Text Content

How It Works

Adding Sources

Source Management

Adding Sources

Source Limits

Technical Implementation

Source Processing Architecture

Multi-Source Research Agent

MarkItDown Integration

Research Output

Integrated Summary

Source References

Cross-Source FAQs

Vector Database

Use Cases

Best Practices

Source Quality

Source Diversity

Limitations

Deep Research

Interactive Q&A

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Self-Hosting

Integrations

​Overview

​Supported Source Types

URLs

Documents

Text Content

​How It Works

​Adding Sources

​Source Management

​Adding Sources

​Source Limits

​Technical Implementation

​Source Processing Architecture

​Multi-Source Research Agent

​MarkItDown Integration

​Research Output

Integrated Summary

Source References

Cross-Source FAQs

Vector Database

​Use Cases

​Best Practices

Source Quality

Source Diversity

​Limitations

​Related Features

Deep Research

Interactive Q&A

Build docs developers (and LLMs) love

Overview

Supported Source Types

How It Works

Adding Sources

Source Management

Adding Sources

Source Limits

Technical Implementation

Source Processing Architecture

Multi-Source Research Agent

MarkItDown Integration

Research Output

Use Cases

Best Practices

Limitations

Related Features