Skip to main content
The SmartScraperMultiGraph allows you to scrape multiple webpages in a single operation, perfect for gathering data from related pages or different sections of a website.

Overview

This example demonstrates how to:
  • Scrape multiple URLs with one graph instance
  • Process results from different sources
  • Aggregate data from multiple pages
  • Handle different page structures

Complete Example

Here’s a working example that scrapes information from multiple portfolio pages:
import json
import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperMultiGraph

load_dotenv()

# Define the configuration for the graph
openai_key = os.getenv("OPENAI_APIKEY")

graph_config = {
    "llm": {
        "api_key": openai_key,
        "model": "openai/gpt-4o",
    },
    "verbose": True,
    "headless": False,
}

# Create the SmartScraperMultiGraph instance and run it
multiple_search_graph = SmartScraperMultiGraph(
    prompt="Who is Marco Perini?",
    source=[
        "https://perinim.github.io/",
        "https://perinim.github.io/cv/"
    ],
    schema=None,
    config=graph_config,
)

result = multiple_search_graph.run()
print(json.dumps(result, indent=4))

Step-by-Step Breakdown

1

Import dependencies

import json
import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperMultiGraph

load_dotenv()
Import SmartScraperMultiGraph for multi-page scraping capabilities.
2

Configure the graph

graph_config = {
    "llm": {
        "api_key": os.getenv("OPENAI_APIKEY"),
        "model": "openai/gpt-4o",
    },
    "verbose": True,
    "headless": False,
}
Use GPT-4o for better understanding across multiple page contexts.
3

Define multiple sources

multiple_search_graph = SmartScraperMultiGraph(
    prompt="Who is Marco Perini?",
    source=[
        "https://perinim.github.io/",
        "https://perinim.github.io/cv/"
    ],
    schema=None,
    config=graph_config,
)
Pass a list of URLs as the source parameter. The graph will scrape all pages and aggregate the results.
4

Run and process results

result = multiple_search_graph.run()
print(json.dumps(result, indent=4))
Results from all pages are combined into a single response.

Multi-Graph Variants

from scrapegraphai.graphs import SmartScraperMultiGraph

# Scrapes all pages and aggregates results
graph = SmartScraperMultiGraph(
    prompt="Extract contact information",
    source=[
        "https://example.com/about",
        "https://example.com/contact",
        "https://example.com/team"
    ],
    config=graph_config,
)
Standard multi-page scraping with result aggregation.

Expected Output

The results are organized by source URL:
[
    {
        "source": "https://perinim.github.io/",
        "data": {
            "name": "Marco Perini",
            "role": "Software Engineer",
            "bio": "Passionate developer with experience in..."
        }
    },
    {
        "source": "https://perinim.github.io/cv/",
        "data": {
            "experience": [
                {
                    "company": "Tech Corp",
                    "position": "Senior Developer",
                    "duration": "2020-Present"
                }
            ],
            "skills": ["Python", "JavaScript", "AI"]
        }
    }
]

Common Use Cases

Product Catalogs

Scrape multiple product pages to build a complete catalog

News Aggregation

Collect articles from different sections or categories

Competitor Analysis

Gather data from multiple competitor websites

Portfolio Scraping

Extract information from various profile or portfolio pages

Performance Considerations

Parallel Processing: Pages are scraped concurrently for better performance.
Token Usage: Multi-page scraping consumes more tokens. Consider using the Lite variant for simple tasks.
Rate Limiting: Be mindful of rate limits when scraping many pages. Add delays if needed.

Advanced Example: Multi-Page with Schema

from typing import List
from pydantic import BaseModel, Field

class ContactInfo(BaseModel):
    name: str = Field(description="Person's name")
    email: str = Field(description="Email address")
    role: str = Field(description="Job title or role")

class TeamMembers(BaseModel):
    members: List[ContactInfo]

multi_graph = SmartScraperMultiGraph(
    prompt="Extract all team member information",
    source=[
        "https://company.com/team/engineering",
        "https://company.com/team/design",
        "https://company.com/team/marketing"
    ],
    schema=TeamMembers,
    config=graph_config,
)

result = multi_graph.run()
This combines multi-page scraping with schema validation for structured output.

Handling Different Page Structures

The AI automatically adapts to different page layouts:
multi_graph = SmartScraperMultiGraph(
    prompt="Extract pricing information",
    source=[
        "https://competitor1.com/pricing",  # Table layout
        "https://competitor2.com/plans",    # Card layout
        "https://competitor3.com/pricing"   # List layout
    ],
    config=graph_config,
)
The graph understands your prompt and extracts relevant data regardless of HTML structure.

Error Handling

try:
    result = multiple_search_graph.run()
    
    # Check for partial failures
    for page_result in result:
        if "error" in page_result:
            print(f"Failed to scrape {page_result['source']}: {page_result['error']}")
        else:
            print(f"Successfully scraped {page_result['source']}")
            
except Exception as e:
    print(f"Scraping failed: {e}")

Next Steps

Custom Schemas

Add structure to your multi-page results

Local Documents

Process multiple local files

Tips for Multi-Page Scraping

  1. Group related pages: Scrape pages with similar content together for better context
  2. Use specific prompts: Be clear about what information should be extracted from all pages
  3. Monitor performance: Use get_execution_info() to track time and token usage
  4. Handle failures gracefully: Some pages might fail; ensure your code handles partial results
  5. Consider pagination: For paginated content, generate URLs programmatically

Build docs developers (and LLMs) love