Overview
SearchGraph is a scraping pipeline that searches the internet for answers to a given prompt. It automatically searches for relevant URLs, scrapes them, and merges the results into a comprehensive answer.
Class Signature
Constructor Parameters
The user prompt to search the internet. This will be used both for searching and for extracting information from found pages.
Configuration parameters for the graph. Must include:
llm: LLM configuration (e.g.,{"model": "openai/gpt-4o"})
max_results(int): Maximum number of search results to scrape (default: 3)search_engine(str): Search engine to use (“google”, “bing”, “duckduckgo”)serper_api_key(str): API key for Serper.dev (for Google search)verbose(bool): Enable detailed loggingheadless(bool): Run browser in headless mode- Other parameters inherited from SmartScraperGraph
Optional Pydantic model defining the expected output structure.
Attributes
The user’s search and extraction prompt.
Configuration dictionary for the graph.
Optional output schema for structured data extraction.
The configured language model instance.
Maximum number of URLs to scrape from search results.
List of URLs that were considered during the search.
Methods
run()
Executes the web scraping and searching process.The merged answer from all scraped sources, or “No answer found.” if extraction fails.
get_considered_urls()
Returns the list of URLs that were considered during the search.A list of URLs that were found and scraped during the search process.
Basic Usage
Structured Output with Schema
Search Engine Configuration
Using DuckDuckGo (Default)
Using Google via Serper.dev
Using Bing
Advanced Usage
Controlling Number of Results
With Browser State
Graph Workflow
The SearchGraph uses the following node pipeline:- SearchInternetNode: Searches the internet for relevant URLs
- GraphIteratorNode: Runs SmartScraperGraph on each found URL
- MergeAnswersNode: Merges all extracted information into a single answer
Accessing Search Results
Execution Information
Comparison with SmartScraperGraph
| Feature | SearchGraph | SmartScraperGraph |
|---|---|---|
| Input | Prompt only | Prompt + Source URL |
| Search | Automatic | Manual |
| Sources | Multiple (search results) | Single URL |
| Output | Merged from multiple sources | Single source |
| Use Case | Research, aggregation | Specific page scraping |
Use Cases
- Market Research: Gather information from multiple sources
- News Aggregation: Collect latest news on a topic
- Product Comparison: Compare products across different websites
- Academic Research: Find and summarize research on a topic
- Competitive Analysis: Gather competitor information
Example: Market Research
Error Handling
Performance Considerations
- max_results: More results = more comprehensive but slower and more expensive
- search_engine: Google (via Serper) is more accurate but requires API key
- LLM model: Faster models (gpt-3.5-turbo) vs. more accurate (gpt-4o)
- parallel execution: Multiple URLs are scraped in parallel for efficiency
Related Graphs
- SmartScraperGraph - Scrape a specific URL
- DepthSearchGraph - Deep crawl with internal links
- OmniSearchGraph - Search with image analysis
