SearchInternetNode

Overview

The SearchInternetNode generates search queries based on user input and searches the internet for relevant information. It uses an LLM to create optimized search queries, then retrieves results from configured search engines.

Class Signature

class SearchInternetNode(BaseNode):
    def __init__(
        self,
        input: str,
        output: List[str],
        node_config: Optional[dict] = None,
        node_name: str = "SearchInternet",
    )

Source: scrapegraphai/nodes/search_internet_node.py:16

Parameters

input

str

required

Boolean expression defining the input keys needed from the state. Typically "user_prompt"

output

List[str]

required

List of output keys to be updated in the state. Typically ["search_results"] or ["urls"]

node_config

dict

required

Configuration dictionary with the following options:

Show Configuration Options

llm_model

object

required

Language model instance for generating search queries (ChatOpenAI, ChatOllama, etc.)

search_engine

str

default:"duckduckgo"

Search engine to use. Options: "duckduckgo", "google", "bing", "serper"

max_results

int

default:"3"

Maximum number of search results to return

serper_api_key

str

default:"None"

API key for Serper search engine (required if using search_engine="serper")

verbose

bool

default:"False"

Whether to show print statements during execution

loader_kwargs

dict

default:"{}"

Additional loader configuration, including proxy settings:

{"proxy": "http://proxy.example.com:8080"}

node_name

str

default:"SearchInternet"

The unique identifier name for the node

State Keys

Input State

user_prompt

str

The user’s query or question that needs internet search

Output State

search_results

List[Dict]

List of search results, each containing:

title: Page title
url: Page URL
snippet: Page description/snippet
content: Full page content (if fetched)

Methods

execute(state: dict) -> dict

Generates a search query from user input and searches the internet for relevant information.

def execute(self, state: dict) -> dict:
    """
    Generates an answer by constructing a prompt from the user's input and the scraped
    content, querying the language model, and parsing its response.
    
    Args:
        state (dict): The current state of the graph.
    
    Returns:
        dict: The updated state with search results.
    
    Raises:
        ValueError: If zero results found for the search query.
    """

Source: scrapegraphai/nodes/search_internet_node.py:60 Processing Steps:

Extract user prompt from state
Generate optimized search query using LLM
Execute search on configured search engine
Return search results in state

Returns: Updated state dictionary with search results

Usage Examples

Basic Internet Search

from scrapegraphai.nodes import SearchInternetNode
from langchain_openai import ChatOpenAI

# Create search node
search_node = SearchInternetNode(
    input="user_prompt",
    output=["search_results"],
    node_config={
        "llm_model": ChatOpenAI(model="gpt-4"),
        "search_engine": "duckduckgo",
        "max_results": 5,
        "verbose": True
    }
)

# Execute node
state = {
    "user_prompt": "What are the latest developments in quantum computing?"
}
updated_state = search_node.execute(state)

print("Search results:", updated_state["search_results"])
# Output: [{"title": "...", "url": "...", "snippet": "..."}, ...]

Using Google Search

search_node = SearchInternetNode(
    input="user_prompt",
    output=["search_results"],
    node_config={
        "llm_model": ChatOpenAI(model="gpt-4"),
        "search_engine": "google",
        "max_results": 10,
        "verbose": False
    }
)

state = {
    "user_prompt": "Best practices for Python web scraping"
}
updated_state = search_node.execute(state)

Using Serper API

search_node = SearchInternetNode(
    input="user_prompt",
    output=["search_results"],
    node_config={
        "llm_model": ChatOpenAI(model="gpt-4"),
        "search_engine": "serper",
        "serper_api_key": "your_serper_api_key_here",
        "max_results": 5,
        "verbose": True
    }
)

state = {
    "user_prompt": "Latest AI research papers 2024"
}
updated_state = search_node.execute(state)

Search with Proxy

search_node = SearchInternetNode(
    input="user_prompt",
    output=["search_results"],
    node_config={
        "llm_model": ChatOpenAI(model="gpt-4"),
        "search_engine": "duckduckgo",
        "max_results": 3,
        "loader_kwargs": {
            "proxy": "http://proxy.example.com:8080"
        }
    }
)

state = {
    "user_prompt": "Current weather in Tokyo"
}
updated_state = search_node.execute(state)

Using Ollama for Query Generation

from langchain_community.chat_models import ChatOllama

search_node = SearchInternetNode(
    input="user_prompt",
    output=["search_results"],
    node_config={
        "llm_model": ChatOllama(model="llama3"),
        "search_engine": "duckduckgo",
        "max_results": 5
    }
)

state = {
    "user_prompt": "Explain machine learning algorithms"
}
updated_state = search_node.execute(state)

Multiple Search Results

search_node = SearchInternetNode(
    input="user_prompt",
    output=["search_results"],
    node_config={
        "llm_model": ChatOpenAI(model="gpt-4"),
        "search_engine": "duckduckgo",
        "max_results": 20,  # Get more results
        "verbose": True
    }
)

state = {
    "user_prompt": "Top Python frameworks for 2024"
}
updated_state = search_node.execute(state)

# Process results
for i, result in enumerate(updated_state["search_results"], 1):
    print(f"{i}. {result['title']}")
    print(f"   URL: {result['url']}")
    print(f"   Snippet: {result['snippet'][:100]}...\n")

Search Query Generation

The node uses an LLM to transform user prompts into optimized search queries:

Query Optimization Process

User Prompt Analysis: LLM analyzes the user’s question
Keyword Extraction: Identifies key terms and concepts
Query Formulation: Creates optimized search query
Result Parsing: Returns comma-separated search terms

Example Transformations

User Prompt	Generated Search Query
”What’s the weather like?”	`"current weather forecast"`
”I need Python tutorials”	`"Python programming tutorials beginner"`
”Latest news about AI”	`"artificial intelligence news 2024"`
”How to bake bread?”	`"bread baking recipe instructions"`

Supported Search Engines

DuckDuckGo (Default)

{
    "search_engine": "duckduckgo",
    # No API key required
    # Privacy-focused
    # Rate-limited
}

Pros:

No API key required
Privacy-focused
Simple to use

Cons:

Rate limiting
Fewer results
No advanced features

Google Search

{
    "search_engine": "google",
    # Requires Google Custom Search API
    # Best quality results
}

Pros:

High-quality results
Comprehensive coverage
Advanced ranking

Cons:

Requires API key
API usage costs
Complex setup

Bing Search

{
    "search_engine": "bing",
    # Requires Bing Search API key
    # Good international coverage
}

Pros:

Good result quality
International support
Reasonable pricing

Cons:

Requires API key
API usage costs

Serper

{
    "search_engine": "serper",
    "serper_api_key": "your_key",
    # Developer-friendly API
    # Good performance
}

Pros:

Easy API integration
Fast responses
Good documentation

Cons:

Requires subscription
Monthly usage limits

Search Result Structure

Each search result contains:

{
    "title": "Page Title",
    "url": "https://example.com/page",
    "snippet": "Brief description of the page content...",
    "content": "Full page text content (if fetched)",
    "position": 1,  # Result ranking
    "metadata": {   # Additional metadata
        "domain": "example.com",
        "date": "2024-01-15",
        # Other engine-specific fields
    }
}

Error Handling

Zero Results Error

# Raises ValueError: "Zero results found for the search query."

This occurs when:

Search query is too specific
Search engine rate limits exceeded
Network connectivity issues
Invalid API credentials

Handling Errors Gracefully

try:
    updated_state = search_node.execute(state)
except ValueError as e:
    if "Zero results" in str(e):
        # Handle no results case
        print("No results found. Try a different query.")
    else:
        raise

Best Practices

Choose appropriate search engine
- Use DuckDuckGo for simple, no-auth searches
- Use Serper for production applications
- Use Google for highest quality results
Optimize max_results
- Start with 3-5 results for speed
- Increase to 10-20 for comprehensive coverage
- Consider API costs and rate limits
Use descriptive user prompts
- Clear prompts generate better search queries
- Include specific keywords and context
Handle rate limits
- Implement retry logic
- Use proxies if needed
- Cache results when possible
Configure timeouts
- Set appropriate timeouts for search operations
- Handle timeout errors gracefully
Enable verbose mode for debugging
- Monitor query generation
- Track search engine responses

Integration Patterns

Search + Fetch + Generate

# 1. Search for relevant URLs
search_node = SearchInternetNode(
    input="user_prompt",
    output=["search_results"],
    node_config={...}
)

# 2. Fetch content from top result
fetch_node = FetchNode(
    input="url",
    output=["document"],
    node_config={...}
)

# 3. Generate answer from fetched content
generate_node = GenerateAnswerNode(
    input="user_prompt & document",
    output=["answer"],
    node_config={...}
)

Multi-Source Search

# Search multiple engines in parallel
search_ddg = SearchInternetNode(
    input="user_prompt",
    output=["ddg_results"],
    node_config={"search_engine": "duckduckgo", ...}
)

search_serper = SearchInternetNode(
    input="user_prompt",
    output=["serper_results"],
    node_config={"search_engine": "serper", ...}
)

# Merge results from both sources

Performance Considerations

Query generation: ~1-2 seconds with GPT-4
Search execution: ~1-3 seconds depending on engine
Total latency: ~2-5 seconds per search
Rate limits: Vary by search engine
API costs: Consider usage-based pricing

FetchNode - Fetch content from search result URLs
GenerateAnswerNode - Generate answers from search results
ConditionalNode - Branch based on search results availability

Graphs

Nodes

Models

Utilities

Overview

Class Signature

Parameters

State Keys

Input State

Output State

Methods

execute(state: dict) -> dict

Usage Examples

Basic Internet Search

Using Google Search

Using Serper API

Search with Proxy

Using Ollama for Query Generation

Multiple Search Results

Search Query Generation

Query Optimization Process

Example Transformations

Supported Search Engines

DuckDuckGo (Default)

Google Search

Bing Search

Serper

Search Result Structure

Error Handling

Zero Results Error

Handling Errors Gracefully

Best Practices

Integration Patterns

Search + Fetch + Generate

Multi-Source Search

Performance Considerations

Build docs developers (and LLMs) love

Graphs

Nodes

Models

Utilities

​Overview

​Class Signature

​Parameters

​State Keys

​Input State

​Output State

​Methods

​execute(state: dict) -> dict

​Usage Examples

​Basic Internet Search

​Using Google Search

​Using Serper API

​Search with Proxy

​Using Ollama for Query Generation

​Multiple Search Results

​Search Query Generation

​Query Optimization Process

​Example Transformations

​Supported Search Engines

​DuckDuckGo (Default)

​Google Search

​Bing Search

​Serper

​Search Result Structure

​Error Handling

​Zero Results Error

​Handling Errors Gracefully

​Best Practices

​Integration Patterns

​Search + Fetch + Generate

​Multi-Source Search

​Performance Considerations

​Related Nodes

Build docs developers (and LLMs) love

Overview

Class Signature

Parameters

State Keys

Input State

Output State

Methods

execute(state: dict) -> dict

Usage Examples

Basic Internet Search

Using Google Search

Using Serper API

Search with Proxy

Using Ollama for Query Generation

Multiple Search Results

Search Query Generation

Query Optimization Process

Example Transformations

Supported Search Engines

DuckDuckGo (Default)

Google Search

Bing Search

Serper

Search Result Structure

Error Handling

Zero Results Error

Handling Errors Gracefully

Best Practices

Integration Patterns

Search + Fetch + Generate

Multi-Source Search

Performance Considerations

Related Nodes