Skip to main content

Overview

Multi-Graphs are specialized graph variants that process multiple sources (URLs, files) in parallel and merge the results into a single comprehensive answer. They are ideal for aggregating information from multiple sources.

Available Multi-Graphs


SmartScraperMultiGraph

Scrapes a list of URLs and generates a merged answer from all sources.

Class Signature

class SmartScraperMultiGraph(AbstractGraph):
    def __init__(
        self,
        prompt: str,
        source: List[str],
        config: dict,
        schema: Optional[Type[BaseModel]] = None,
    )

Parameters

prompt
str
required
The extraction prompt to apply to all URLs.
source
List[str]
required
List of URLs to scrape.
config
dict
required
Configuration with llm settings.
schema
Type[BaseModel]
default:"None"
Optional output schema.

Usage Example

from scrapegraphai.graphs import SmartScraperMultiGraph

config = {
    "llm": {
        "model": "openai/gpt-4o",
        "api_key": "your-api-key"
    }
}

urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
]

multi_scraper = SmartScraperMultiGraph(
    prompt="Extract all product information",
    source=urls,
    config=config
)

result = multi_scraper.run()
print(result)  # Merged results from all 3 URLs

Workflow

GraphIteratorNode → MergeAnswersNode
  1. GraphIteratorNode: Runs SmartScraperGraph on each URL in parallel
  2. MergeAnswersNode: Merges all results into a single answer

CSVScraperMultiGraph

Processes multiple CSV files and generates a merged answer.

Class Signature

class CSVScraperMultiGraph(AbstractGraph):
    def __init__(
        self,
        prompt: str,
        source: List[str],
        config: dict,
        schema: Optional[Type[BaseModel]] = None,
    )

Usage Example

from scrapegraphai.graphs import CSVScraperMultiGraph

config = {
    "llm": {"model": "openai/gpt-4o"}
}

csv_files = [
    "sales_2023.csv",
    "sales_2024.csv",
    "sales_2025.csv"
]

multi_csv = CSVScraperMultiGraph(
    prompt="Compare total revenue across all years",
    source=csv_files,
    config=config
)

result = multi_csv.run()

JSONScraperMultiGraph

Processes multiple JSON files and generates a merged answer.

Class Signature

class JSONScraperMultiGraph(AbstractGraph):
    def __init__(
        self,
        prompt: str,
        source: List[str],
        config: dict,
        schema: Optional[Type[BaseModel]] = None,
    )

Usage Example

from scrapegraphai.graphs import JSONScraperMultiGraph

config = {
    "llm": {"model": "openai/gpt-4o"}
}

json_files = [
    "users_2023.json",
    "users_2024.json",
    "users_2025.json"
]

multi_json = JSONScraperMultiGraph(
    prompt="Count total users and analyze growth trends",
    source=json_files,
    config=config
)

result = multi_json.run()

XMLScraperMultiGraph

Processes multiple XML files and generates a merged answer.

Class Signature

class XMLScraperMultiGraph(AbstractGraph):
    def __init__(
        self,
        prompt: str,
        source: List[str],
        config: dict,
        schema: Optional[Type[BaseModel]] = None,
    )

Usage Example

from scrapegraphai.graphs import XMLScraperMultiGraph

config = {
    "llm": {"model": "openai/gpt-4o"}
}

xml_files = [
    "config_prod.xml",
    "config_staging.xml",
    "config_dev.xml"
]

multi_xml = XMLScraperMultiGraph(
    prompt="Compare configuration differences across environments",
    source=xml_files,
    config=config
)

result = multi_xml.run()

DocumentScraperMultiGraph

Processes multiple Markdown files and generates a merged answer.

Class Signature

class DocumentScraperMultiGraph(AbstractGraph):
    def __init__(
        self,
        prompt: str,
        source: List[str],
        config: dict,
        schema: Optional[Type[BaseModel]] = None,
    )

Usage Example

from scrapegraphai.graphs import DocumentScraperMultiGraph

config = {
    "llm": {"model": "openai/gpt-4o"}
}

md_files = [
    "docs/intro.md",
    "docs/installation.md",
    "docs/quickstart.md"
]

multi_doc = DocumentScraperMultiGraph(
    prompt="Create a comprehensive getting started guide",
    source=md_files,
    config=config
)

result = multi_doc.run()

ScriptCreatorMultiGraph

Generates web scraping scripts for multiple URLs and merges them.

Class Signature

class ScriptCreatorMultiGraph(AbstractGraph):
    def __init__(
        self,
        prompt: str,
        source: List[str],
        config: dict,
        schema: Optional[Type[BaseModel]] = None,
    )

Usage Example

from scrapegraphai.graphs import ScriptCreatorMultiGraph

config = {
    "llm": {"model": "openai/gpt-4o"},
    "library": "beautifulsoup"
}

urls = [
    "https://example.com/products",
    "https://example.com/reviews",
    "https://example.com/pricing"
]

script_multi = ScriptCreatorMultiGraph(
    prompt="Generate scraping scripts for all pages",
    source=urls,
    config=config
)

merged_script = script_multi.run()
print(merged_script)  # Combined script for all URLs

Workflow

GraphIteratorNode → MergeGeneratedScriptsNode

Common Features

All Multi-Graphs share these characteristics:

Parallel Processing

Multiple sources are processed in parallel for efficiency:
# Processes all URLs simultaneously
urls = ["url1", "url2", "url3", "url4", "url5"]
multi_scraper = SmartScraperMultiGraph(
    prompt="Extract data",
    source=urls,
    config=config
)
# Much faster than sequential processing

Merged Output

Results from all sources are intelligently merged:
result = multi_scraper.run()
# Single comprehensive answer combining all sources

Structured Output

Support for Pydantic schemas across all sources:
from pydantic import BaseModel
from typing import List

class Product(BaseModel):
    name: str
    price: float
    source_url: str

class Products(BaseModel):
    products: List[Product]
    total_count: int

multi_scraper = SmartScraperMultiGraph(
    prompt="Extract all products",
    source=urls,
    config=config,
    schema=Products
)

result = multi_scraper.run()

Advanced Usage

Competitive Analysis

from pydantic import BaseModel
from typing import List

class CompetitorInfo(BaseModel):
    company: str
    pricing: str
    features: List[str]
    unique_selling_points: List[str]

class MarketAnalysis(BaseModel):
    competitors: List[CompetitorInfo]
    market_insights: str
    recommendations: str

competitor_urls = [
    "https://competitor1.com/pricing",
    "https://competitor2.com/features",
    "https://competitor3.com/products"
]

config = {
    "llm": {"model": "openai/gpt-4o"},
    "additional_info": "Focus on pricing, features, and competitive advantages"
}

multi_scraper = SmartScraperMultiGraph(
    prompt="Analyze competitor offerings and provide market insights",
    source=competitor_urls,
    config=config,
    schema=MarketAnalysis
)

analysis = multi_scraper.run()

News Aggregation

from pydantic import BaseModel
from typing import List

class Article(BaseModel):
    title: str
    summary: str
    source: str
    key_points: List[str]

class NewsDigest(BaseModel):
    articles: List[Article]
    overall_summary: str
    common_themes: List[str]

news_urls = [
    "https://news1.com/article1",
    "https://news2.com/article2",
    "https://news3.com/article3"
]

multi_scraper = SmartScraperMultiGraph(
    prompt="Create a news digest with summaries and common themes",
    source=news_urls,
    config=config,
    schema=NewsDigest
)

digest = multi_scraper.run()

Multi-Site Product Comparison

from pydantic import BaseModel
from typing import List, Optional

class ProductListing(BaseModel):
    product_name: str
    site: str
    price: float
    availability: str
    rating: Optional[float] = None
    shipping: Optional[str] = None

class PriceComparison(BaseModel):
    listings: List[ProductListing]
    best_deal: str
    price_range: str
    recommendation: str

product_urls = [
    "https://site1.com/product/12345",
    "https://site2.com/product/67890",
    "https://site3.com/product/11111"
]

multi_scraper = SmartScraperMultiGraph(
    prompt="Compare the same product across different sites and recommend the best deal",
    source=product_urls,
    config=config,
    schema=PriceComparison
)

comparison = multi_scraper.run()

Performance Considerations

Number of Sources

# Recommended: 3-10 sources for balanced performance
urls = ["url1", "url2", "url3", "url4", "url5"]

# Large scale: 10-50 sources (slower but comprehensive)
many_urls = [f"https://example.com/page{i}" for i in range(50)]

# Consider breaking very large batches into smaller groups
for batch in chunks(many_urls, 10):
    result = SmartScraperMultiGraph(
        prompt="Extract data",
        source=batch,
        config=config
    ).run()

Timeout Configuration

config = {
    "llm": {"model": "openai/gpt-4o"},
    "timeout": 600,  # 10 minutes for multiple sources
    "verbose": True  # Monitor progress
}

Error Handling

try:
    result = multi_scraper.run()
    
    if result == "No answer found.":
        print("Failed to extract from all sources")
        
        # Check individual results in state
        final_state = multi_scraper.get_state()
        individual_results = final_state.get("results", [])
        
        print(f"Successfully processed {len(individual_results)} sources")
        for i, res in enumerate(individual_results, 1):
            print(f"Source {i}: {res}")
    else:
        print(f"Success: {result}")
        
except Exception as e:
    print(f"Error during multi-scraping: {e}")

Comparison: Multi vs Single Graphs

FeatureMulti-GraphSingle Graph
SourcesMultipleSingle
ProcessingParallelSequential
OutputMergedSingle
Use CaseAggregationFocused extraction
ComplexityHigherLower
CostHigher (more API calls)Lower

When to Use Multi-Graphs

Use Multi-Graphs when:
  • You need to aggregate data from multiple sources
  • Sources contain complementary information
  • You want comparative analysis
  • You need comprehensive coverage
Use Single Graphs when:
  • You have a single source
  • Sources are independent
  • You need fine-grained control
  • Cost optimization is priority

Build docs developers (and LLMs) love