Introduction
Nodes are the fundamental building blocks of ScrapeGraphAI graphs. Each node performs a specific operation in the scraping workflow, such as fetching content, parsing HTML, or generating answers using LLMs.BaseNode Class
All nodes in ScrapeGraphAI inherit from the abstractBaseNode class, which provides the core functionality for node execution and state management.
Class Signature
Attributes
The unique identifier name for the node
Type of the node; must be ‘node’ or ‘conditional_node’
Boolean expression defining the input keys needed from the state. Supports AND (
&) and OR (|) operatorsList of output keys to be updated in the state
Minimum required number of input keys
Additional configuration for the node
Node Lifecycle
1. Initialization
Nodes are initialized with their configuration parameters:2. Execution
Nodes implement the abstractexecute() method to perform their logic:
3. Input Processing
Nodes useget_input_keys() to extract required data from the state:
4. State Updates
Nodes update the state with their output:Input Expressions
Theinput parameter supports Boolean expressions for flexible state key matching:
- Simple input:
"url"- requires theurlkey in state - AND operator:
"url & user_prompt"- requires both keys - OR operator:
"url | local_dir"- requires at least one key - Complex expressions:
"(url | local_dir) & user_prompt"- supports parentheses for grouping
Creating Custom Nodes
To create a custom node, extend theBaseNode class and implement the execute() method:
Node Types
ScrapeGraphAI provides two main node types:Standard Nodes
Standard nodes (node_type="node") perform operations and always proceed to the next node in the graph:
- FetchNode - Fetches content from URLs or files
- ParseNode - Parses and chunks HTML content
- GenerateAnswerNode - Generates answers using LLMs
- SearchNode - Searches the internet for information
- ReasoningNode - Refines prompts with reasoning
Conditional Nodes
Conditional nodes (node_type="conditional_node") implement branching logic based on state conditions:
- ConditionalNode - Directs flow based on state key presence or custom conditions
Utility Methods
update_config()
Updates node configuration and attributes:get_input_keys()
Determines necessary state keys based on input specification:Best Practices
- Use descriptive node names - Makes debugging easier
- Validate input data - Check for required keys and data types
- Handle errors gracefully - Use try-except blocks for robustness
- Log important information - Use
self.loggerfor debugging - Keep nodes focused - Each node should have a single responsibility
- Document state changes - Clearly specify input and output keys
Next Steps
- Learn about specific node implementations:
