SearchGraph

Overview

SearchGraph is a scraping pipeline that searches the internet for answers to a given prompt. It automatically searches for relevant URLs, scrapes them, and merges the results into a comprehensive answer.

Class Signature

class SearchGraph(AbstractGraph):
    def __init__(
        self,
        prompt: str,
        config: dict,
        schema: Optional[Type[BaseModel]] = None
    )

Constructor Parameters

prompt

str

required

The user prompt to search the internet. This will be used both for searching and for extracting information from found pages.

config

dict

required

Configuration parameters for the graph. Must include:

llm: LLM configuration (e.g., {"model": "openai/gpt-4o"})

Optional parameters:

max_results (int): Maximum number of search results to scrape (default: 3)
search_engine (str): Search engine to use (“google”, “bing”, “duckduckgo”)
serper_api_key (str): API key for Serper.dev (for Google search)
verbose (bool): Enable detailed logging
headless (bool): Run browser in headless mode
Other parameters inherited from SmartScraperGraph

schema

Type[BaseModel]

default:"None"

Optional Pydantic model defining the expected output structure.

Attributes

prompt

str

The user’s search and extraction prompt.

config

dict

Configuration dictionary for the graph.

schema

BaseModel

Optional output schema for structured data extraction.

llm_model

object

The configured language model instance.

max_results

int

Maximum number of URLs to scrape from search results.

considered_urls

List[str]

List of URLs that were considered during the search.

Methods

run()

Executes the web scraping and searching process.

def run(self) -> str

return

str

The merged answer from all scraped sources, or “No answer found.” if extraction fails.

get_considered_urls()

Returns the list of URLs that were considered during the search.

def get_considered_urls(self) -> List[str]

return

List[str]

A list of URLs that were found and scraped during the search process.

Basic Usage

from scrapegraphai.graphs import SearchGraph

graph_config = {
    "llm": {
        "model": "openai/gpt-4o",
        "api_key": "your-api-key"
    },
    "max_results": 5
}

search_graph = SearchGraph(
    prompt="What is Chioggia famous for?",
    config=graph_config
)

result = search_graph.run()
print(result)

# Get the URLs that were scraped
urls = search_graph.get_considered_urls()
print(f"Scraped {len(urls)} URLs:")
for url in urls:
    print(f"  - {url}")

Structured Output with Schema

from pydantic import BaseModel, Field
from typing import List

class Restaurant(BaseModel):
    name: str = Field(description="Restaurant name")
    cuisine: str = Field(description="Type of cuisine")
    rating: float = Field(description="Average rating")
    location: str = Field(description="Address or area")

class RestaurantList(BaseModel):
    restaurants: List[Restaurant]

search_graph = SearchGraph(
    prompt="Find the best Italian restaurants in San Francisco",
    config=graph_config,
    schema=RestaurantList
)

result = search_graph.run()
print(result)

Search Engine Configuration

Using DuckDuckGo (Default)

config = {
    "llm": {"model": "openai/gpt-4o"},
    "search_engine": "duckduckgo",  # No API key needed
    "max_results": 5
}

search_graph = SearchGraph(
    prompt="Latest AI news",
    config=config
)

Using Google via Serper.dev

config = {
    "llm": {"model": "openai/gpt-4o"},
    "search_engine": "google",
    "serper_api_key": "your-serper-api-key",
    "max_results": 10
}

search_graph = SearchGraph(
    prompt="Best Python web scraping libraries 2024",
    config=config
)

Using Bing

config = {
    "llm": {"model": "openai/gpt-4o"},
    "search_engine": "bing",
    "max_results": 5
}

search_graph = SearchGraph(
    prompt="Machine learning tutorials",
    config=config
)

Advanced Usage

Controlling Number of Results

# Scrape fewer URLs for faster results
config = {
    "llm": {"model": "openai/gpt-4o"},
    "max_results": 3  # Only scrape top 3 results
}

# Scrape more URLs for comprehensive coverage
config = {
    "llm": {"model": "openai/gpt-4o"},
    "max_results": 10  # Scrape top 10 results
}

With Browser State

config = {
    "llm": {"model": "openai/gpt-4o"},
    "storage_state": "./auth_state.json",  # Use authenticated session
    "max_results": 5
}

search_graph = SearchGraph(
    prompt="My private GitHub repositories",
    config=config
)

Graph Workflow

The SearchGraph uses the following node pipeline:

SearchInternetNode → GraphIteratorNode → MergeAnswersNode

SearchInternetNode: Searches the internet for relevant URLs
GraphIteratorNode: Runs SmartScraperGraph on each found URL
MergeAnswersNode: Merges all extracted information into a single answer

Accessing Search Results

result = search_graph.run()

# Get the merged answer
print("Answer:", result)

# Get all considered URLs
urls = search_graph.get_considered_urls()
print(f"\nScraped {len(urls)} sources:")
for i, url in enumerate(urls, 1):
    print(f"{i}. {url}")

# Access full state
final_state = search_graph.get_state()
print("\nRaw results from each URL:")
for i, res in enumerate(final_state.get("results", []), 1):
    print(f"\nResult {i}:")
    print(res)

Execution Information

result = search_graph.run()

# Get detailed execution metrics
exec_info = search_graph.get_execution_info()
for node_info in exec_info:
    print(f"Node: {node_info['node_name']}")
    print(f"  Time: {node_info['exec_time']:.2f}s")
    print(f"  Tokens: {node_info['total_tokens']}")
    print(f"  Cost: ${node_info['total_cost_USD']:.4f}")
    print()

Comparison with SmartScraperGraph

Feature	SearchGraph	SmartScraperGraph
Input	Prompt only	Prompt + Source URL
Search	Automatic	Manual
Sources	Multiple (search results)	Single URL
Output	Merged from multiple sources	Single source
Use Case	Research, aggregation	Specific page scraping

Use Cases

Market Research: Gather information from multiple sources
News Aggregation: Collect latest news on a topic
Product Comparison: Compare products across different websites
Academic Research: Find and summarize research on a topic
Competitive Analysis: Gather competitor information

Example: Market Research

from pydantic import BaseModel, Field
from typing import List

class CompanyInfo(BaseModel):
    name: str
    headquarters: str
    employees: str
    revenue: str
    products: List[str]

config = {
    "llm": {"model": "openai/gpt-4o"},
    "max_results": 5,
    "search_engine": "google",
    "serper_api_key": "your-key"
}

search_graph = SearchGraph(
    prompt="Information about Tesla Inc: headquarters, employees, revenue, and main products",
    config=config,
    schema=CompanyInfo
)

result = search_graph.run()
print(result)

Error Handling

try:
    result = search_graph.run()
    
    if result == "No answer found.":
        print("No relevant information found")
        urls = search_graph.get_considered_urls()
        print(f"Searched {len(urls)} URLs")
    else:
        print(f"Success: {result}")
        
except Exception as e:
    print(f"Error during search: {e}")

Performance Considerations

max_results: More results = more comprehensive but slower and more expensive
search_engine: Google (via Serper) is more accurate but requires API key
LLM model: Faster models (gpt-3.5-turbo) vs. more accurate (gpt-4o)
parallel execution: Multiple URLs are scraped in parallel for efficiency

SmartScraperGraph - Scrape a specific URL
DepthSearchGraph - Deep crawl with internal links
OmniSearchGraph - Search with image analysis

Graphs

Nodes

Models

Utilities

Overview

Class Signature

Constructor Parameters

Attributes

Methods

run()

get_considered_urls()

Basic Usage

Structured Output with Schema

Search Engine Configuration

Using DuckDuckGo (Default)

Using Google via Serper.dev

Using Bing

Advanced Usage

Controlling Number of Results

With Browser State

Graph Workflow

Accessing Search Results

Execution Information

Comparison with SmartScraperGraph

Use Cases

Example: Market Research

Error Handling

Performance Considerations

Build docs developers (and LLMs) love

Graphs

Nodes

Models

Utilities

​Overview

​Class Signature

​Constructor Parameters

​Attributes

​Methods

​run()

​get_considered_urls()

​Basic Usage

​Structured Output with Schema

​Search Engine Configuration

​Using DuckDuckGo (Default)

​Using Google via Serper.dev

​Using Bing

​Advanced Usage

​Controlling Number of Results

​With Browser State

​Graph Workflow

​Accessing Search Results

​Execution Information

​Comparison with SmartScraperGraph

​Use Cases

​Example: Market Research

​Error Handling

​Performance Considerations

​Related Graphs

Build docs developers (and LLMs) love

Overview

Class Signature

Constructor Parameters

Attributes

Methods

run()

get_considered_urls()

Basic Usage

Structured Output with Schema

Search Engine Configuration

Using DuckDuckGo (Default)

Using Google via Serper.dev

Using Bing

Advanced Usage

Controlling Number of Results

With Browser State

Graph Workflow

Accessing Search Results

Execution Information

Comparison with SmartScraperGraph

Use Cases

Example: Market Research

Error Handling

Performance Considerations

Related Graphs