Overview
The SearchInternetNode generates search queries based on user input and searches the internet for relevant information. It uses an LLM to create optimized search queries, then retrieves results from configured search engines.
Class Signature
class SearchInternetNode ( BaseNode ):
def __init__ (
self ,
input : str ,
output : List[ str ],
node_config : Optional[ dict ] = None ,
node_name : str = "SearchInternet" ,
)
Source: scrapegraphai/nodes/search_internet_node.py:16
Parameters
Boolean expression defining the input keys needed from the state. Typically "user_prompt"
List of output keys to be updated in the state. Typically ["search_results"] or ["urls"]
Configuration dictionary with the following options: Show Configuration Options
Language model instance for generating search queries (ChatOpenAI, ChatOllama, etc.)
Search engine to use. Options: "duckduckgo", "google", "bing", "serper"
Maximum number of search results to return
API key for Serper search engine (required if using search_engine="serper")
Whether to show print statements during execution
Additional loader configuration, including proxy settings: { "proxy" : "http://proxy.example.com:8080" }
node_name
str
default: "SearchInternet"
The unique identifier name for the node
State Keys
The user’s query or question that needs internet search
Output State
List of search results, each containing:
title: Page title
url: Page URL
snippet: Page description/snippet
content: Full page content (if fetched)
Methods
execute(state: dict) -> dict
Generates a search query from user input and searches the internet for relevant information.
def execute ( self , state : dict ) -> dict :
"""
Generates an answer by constructing a prompt from the user's input and the scraped
content, querying the language model, and parsing its response.
Args:
state (dict): The current state of the graph.
Returns:
dict: The updated state with search results.
Raises:
ValueError: If zero results found for the search query.
"""
Source: scrapegraphai/nodes/search_internet_node.py:60
Processing Steps:
Extract user prompt from state
Generate optimized search query using LLM
Execute search on configured search engine
Return search results in state
Returns: Updated state dictionary with search results
Usage Examples
Basic Internet Search
from scrapegraphai.nodes import SearchInternetNode
from langchain_openai import ChatOpenAI
# Create search node
search_node = SearchInternetNode(
input = "user_prompt" ,
output = [ "search_results" ],
node_config = {
"llm_model" : ChatOpenAI( model = "gpt-4" ),
"search_engine" : "duckduckgo" ,
"max_results" : 5 ,
"verbose" : True
}
)
# Execute node
state = {
"user_prompt" : "What are the latest developments in quantum computing?"
}
updated_state = search_node.execute(state)
print ( "Search results:" , updated_state[ "search_results" ])
# Output: [{"title": "...", "url": "...", "snippet": "..."}, ...]
Using Google Search
search_node = SearchInternetNode(
input = "user_prompt" ,
output = [ "search_results" ],
node_config = {
"llm_model" : ChatOpenAI( model = "gpt-4" ),
"search_engine" : "google" ,
"max_results" : 10 ,
"verbose" : False
}
)
state = {
"user_prompt" : "Best practices for Python web scraping"
}
updated_state = search_node.execute(state)
Using Serper API
search_node = SearchInternetNode(
input = "user_prompt" ,
output = [ "search_results" ],
node_config = {
"llm_model" : ChatOpenAI( model = "gpt-4" ),
"search_engine" : "serper" ,
"serper_api_key" : "your_serper_api_key_here" ,
"max_results" : 5 ,
"verbose" : True
}
)
state = {
"user_prompt" : "Latest AI research papers 2024"
}
updated_state = search_node.execute(state)
Search with Proxy
search_node = SearchInternetNode(
input = "user_prompt" ,
output = [ "search_results" ],
node_config = {
"llm_model" : ChatOpenAI( model = "gpt-4" ),
"search_engine" : "duckduckgo" ,
"max_results" : 3 ,
"loader_kwargs" : {
"proxy" : "http://proxy.example.com:8080"
}
}
)
state = {
"user_prompt" : "Current weather in Tokyo"
}
updated_state = search_node.execute(state)
Using Ollama for Query Generation
from langchain_community.chat_models import ChatOllama
search_node = SearchInternetNode(
input = "user_prompt" ,
output = [ "search_results" ],
node_config = {
"llm_model" : ChatOllama( model = "llama3" ),
"search_engine" : "duckduckgo" ,
"max_results" : 5
}
)
state = {
"user_prompt" : "Explain machine learning algorithms"
}
updated_state = search_node.execute(state)
Multiple Search Results
search_node = SearchInternetNode(
input = "user_prompt" ,
output = [ "search_results" ],
node_config = {
"llm_model" : ChatOpenAI( model = "gpt-4" ),
"search_engine" : "duckduckgo" ,
"max_results" : 20 , # Get more results
"verbose" : True
}
)
state = {
"user_prompt" : "Top Python frameworks for 2024"
}
updated_state = search_node.execute(state)
# Process results
for i, result in enumerate (updated_state[ "search_results" ], 1 ):
print ( f " { i } . { result[ 'title' ] } " )
print ( f " URL: { result[ 'url' ] } " )
print ( f " Snippet: { result[ 'snippet' ][: 100 ] } ... \n " )
Search Query Generation
The node uses an LLM to transform user prompts into optimized search queries:
Query Optimization Process
User Prompt Analysis : LLM analyzes the user’s question
Keyword Extraction : Identifies key terms and concepts
Query Formulation : Creates optimized search query
Result Parsing : Returns comma-separated search terms
User Prompt Generated Search Query ”What’s the weather like?” "current weather forecast"”I need Python tutorials” "Python programming tutorials beginner"”Latest news about AI” "artificial intelligence news 2024"”How to bake bread?” "bread baking recipe instructions"
Supported Search Engines
DuckDuckGo (Default)
{
"search_engine" : "duckduckgo" ,
# No API key required
# Privacy-focused
# Rate-limited
}
Pros:
No API key required
Privacy-focused
Simple to use
Cons:
Rate limiting
Fewer results
No advanced features
Google Search
{
"search_engine" : "google" ,
# Requires Google Custom Search API
# Best quality results
}
Pros:
High-quality results
Comprehensive coverage
Advanced ranking
Cons:
Requires API key
API usage costs
Complex setup
Bing Search
{
"search_engine" : "bing" ,
# Requires Bing Search API key
# Good international coverage
}
Pros:
Good result quality
International support
Reasonable pricing
Cons:
Requires API key
API usage costs
Serper
{
"search_engine" : "serper" ,
"serper_api_key" : "your_key" ,
# Developer-friendly API
# Good performance
}
Pros:
Easy API integration
Fast responses
Good documentation
Cons:
Requires subscription
Monthly usage limits
Search Result Structure
Each search result contains:
{
"title" : "Page Title" ,
"url" : "https://example.com/page" ,
"snippet" : "Brief description of the page content..." ,
"content" : "Full page text content (if fetched)" ,
"position" : 1 , # Result ranking
"metadata" : { # Additional metadata
"domain" : "example.com" ,
"date" : "2024-01-15" ,
# Other engine-specific fields
}
}
Error Handling
Zero Results Error
# Raises ValueError: "Zero results found for the search query."
This occurs when:
Search query is too specific
Search engine rate limits exceeded
Network connectivity issues
Invalid API credentials
Handling Errors Gracefully
try :
updated_state = search_node.execute(state)
except ValueError as e:
if "Zero results" in str (e):
# Handle no results case
print ( "No results found. Try a different query." )
else :
raise
Best Practices
Choose appropriate search engine
Use DuckDuckGo for simple, no-auth searches
Use Serper for production applications
Use Google for highest quality results
Optimize max_results
Start with 3-5 results for speed
Increase to 10-20 for comprehensive coverage
Consider API costs and rate limits
Use descriptive user prompts
Clear prompts generate better search queries
Include specific keywords and context
Handle rate limits
Implement retry logic
Use proxies if needed
Cache results when possible
Configure timeouts
Set appropriate timeouts for search operations
Handle timeout errors gracefully
Enable verbose mode for debugging
Monitor query generation
Track search engine responses
Integration Patterns
Search + Fetch + Generate
# 1. Search for relevant URLs
search_node = SearchInternetNode(
input = "user_prompt" ,
output = [ "search_results" ],
node_config = { ... }
)
# 2. Fetch content from top result
fetch_node = FetchNode(
input = "url" ,
output = [ "document" ],
node_config = { ... }
)
# 3. Generate answer from fetched content
generate_node = GenerateAnswerNode(
input = "user_prompt & document" ,
output = [ "answer" ],
node_config = { ... }
)
Multi-Source Search
# Search multiple engines in parallel
search_ddg = SearchInternetNode(
input = "user_prompt" ,
output = [ "ddg_results" ],
node_config = { "search_engine" : "duckduckgo" , ... }
)
search_serper = SearchInternetNode(
input = "user_prompt" ,
output = [ "serper_results" ],
node_config = { "search_engine" : "serper" , ... }
)
# Merge results from both sources
Query generation : ~1-2 seconds with GPT-4
Search execution : ~1-3 seconds depending on engine
Total latency : ~2-5 seconds per search
Rate limits : Vary by search engine
API costs : Consider usage-based pricing