Overview
By default, Graph Node stores a complete version history for all entities, allowing time-travel queries to any historical block. While powerful, this creates significant storage overhead and can degrade performance. Pruning removes entity versions older than a specified block, making it impossible to query the deployment at blocks prior to the pruning point. However, queries at blocks after the pruning point are unaffected and perform better due to reduced data volume.Benefits
- Reduced storage: Often reduces deployment data by a very large amount
- Faster queries: Less data to scan improves query performance considerably
- Better indexing speed: Smaller tables and indexes speed up block processing
- Lower infrastructure costs: Decreased storage and compute requirements
Tradeoffs
How Pruning Works
History Retention
Pruning is controlled by thehistory_blocks parameter, which specifies how many blocks of history to retain. The actual pruning block b is calculated as:
earliest_block for the deployment is updated, and Graph Node returns an error for any query attempting to time-travel before this block.
The value of
history_blocks must be greater than ETHEREUM_REORG_THRESHOLD to ensure that chain reorganizations never conflict with pruning operations.Accessing Earliest Block
You can retrieve theearliest_block through the Index Node status API:
Using Graphman Prune
Initial Pruning
Start pruning a deployment with:<deployment>- Deployment identifier (name, IPFS hash, or namespace)--history <blocks>- Number of blocks of history to retain
- Performs an initial prune of the deployment
- Sets the deployment’s
history_blocksconfiguration - Enables automatic repruning when more history accumulates
One-Time Pruning
To prune once without enabling automatic repruning:Disabling Automatic Pruning
To stop automatic repruning after initial setup:history_blocks to a very large value effectively disables repruning.
Automatic Repruning
How It Works
After initial pruning, Graph Node automatically reprunes the deployment when it accumulates more than:GRAPH_STORE_HISTORY_SLACK_FACTOR=1.5history_blocks=10000- Reprune triggers when history exceeds:
10000 * 1.5 = 15000blocks
Reprune Frequency
Repruning occurs every:10000 * (1.5 - 1) = 5000 blocks between reprunes.
Configuration
Set the slack factor with the environment variable:Set this value high enough so repruning occurs relatively infrequently to avoid excessive database work. Values between 1.3 and 2.0 are typical.
Pruning Strategies
Graph Node uses two different strategies to remove unneeded data:Delete Strategy
How it works: Deletes rows (old entity versions) from existing tables. When used: When the estimated fraction of data to remove is betweenDELETE_THRESHOLD and REBUILD_THRESHOLD.
Advantages:
- Does not block indexing
- Simpler operation
- Lower peak resource usage
- Tables remain fragmented
- Storage is not immediately reclaimed
- May be slower for large deletions
Rebuild Strategy
How it works: Copies data that should be kept into new tables, then replaces the existing tables with these smaller tables. When used: When the estimated fraction of data to remove exceedsREBUILD_THRESHOLD.
Advantages:
- Much smaller final tables
- Storage immediately reclaimed
- Better performance for large deletions
- Tables are defragmented
- Temporarily blocks indexing while copying non-final entities
- Higher peak resource usage
- More complex operation
Strategy Selection Thresholds
Configure the thresholds with environment variables (values between 0 and 1):- If estimated removal >
REBUILD_THRESHOLD: rebuild the table - If estimated removal between
DELETE_THRESHOLDandREBUILD_THRESHOLD: delete old versions - If estimated removal <
DELETE_THRESHOLD: skip (not worth processing)
Graph Node automatically analyzes tables to estimate removal fractions, using Postgres statistics for accuracy.
Batching and Performance
Batch Target Duration
To avoid very long-running transactions, pruning operations are broken into batches:Parallel Operation
In most cases, pruning runs in parallel with indexing without blocking it. Exception: When using the rebuild strategy, indexing is blocked while copying non-final entities from the existing table to the new table. This is typically brief but depends on the number of non-final entities.Monitoring Pruning Operations
Initial Prune
Thegraphman prune command prints a progress report to the console:
Ongoing Repruning
For automatic reprune operations, watch Graph Node logs for:Metrics
While Graph Node doesn’t expose dedicated pruning metrics, you can monitor:Use Cases and Recommendations
Time-Series Data
Scenario: Subgraph tracks daily statistics (e.g., daily trading volume, daily active users). Recommendation:Event Logs
Scenario: Subgraph indexes event logs that are never updated. Recommendation:Lifetime Statistics
Scenario: Entities track lifetime statistics (e.g., total token transfers, all-time high price). Consideration: Pruning limits how far back you can generate time-series from these statistics. Recommendation:history_blocks based on the oldest historical query your application needs. For example, 100,000 blocks provides ~2 weeks of Ethereum mainnet history.
High-Churn Entities
Scenario: Entities update frequently (e.g., token prices, user balances). Recommendation:history_blocks) provides significant storage and performance benefits.
Archived Deployments
Scenario: Old deployment versions no longer receiving queries. Recommendation:--once if you plan to remove the deployment entirely soon.
Best Practices
Start with conservative history_blocks
Start with conservative history_blocks
Begin with a higher value (e.g., 50,000 blocks) and reduce it based on your actual query patterns. Monitor query errors to find the right balance.
Consider reorg threshold
Consider reorg threshold
Always set
history_blocks significantly higher than ETHEREUM_REORG_THRESHOLD to prevent conflicts. Add at least 1000 blocks as a safety margin.Monitor initial prune duration
Monitor initial prune duration
The first prune can take considerable time for large deployments. Run it during low-traffic periods if possible.
Test in staging first
Test in staging first
Prune a copy of your deployment in a staging environment to estimate duration and verify query compatibility.
Document your history_blocks choice
Document your history_blocks choice
Record why you chose a specific value so future maintainers understand the tradeoff.
Plan for grafting
Plan for grafting
If you might need to graft from this deployment, ensure
history_blocks is large enough to support your grafting point.Tune rebuild threshold
Tune rebuild threshold
For deployments with many tables, adjust
REBUILD_THRESHOLD to balance between performance gains and indexing interruptions.Troubleshooting
Queries return “block not found” errors
Cause: Query requests a block beforeearliest_block.
Solution:
- Check the deployment’s
earliest_blockvia the status API - Modify queries to respect this limit
- Consider increasing
history_blocksif this is too restrictive
Pruning takes too long
Causes and solutions:Large tables with high removal percentage
Large tables with high removal percentage
Symptom: Rebuild operations taking hours.Solution: This is expected for very large deployments. Consider:
- Running initial prune during maintenance windows
- Lowering
REBUILD_THRESHOLDto use delete strategy instead - Increasing
GRAPH_STORE_BATCH_TARGET_DURATIONfor fewer, larger batches
Frequent repruning
Frequent repruning
Symptom: Reprune operations running too often.Solution: Increase This reduces reprune frequency but allows more history accumulation.
GRAPH_STORE_HISTORY_SLACK_FACTOR:Outdated table statistics
Outdated table statistics
Symptom: Pruning analyzes tables repeatedly or chooses suboptimal strategies.Solution: Manually update Postgres statistics:
Indexing blocked during rebuild
Symptom: Deployment stops indexing during prune operations. Expected behavior: This occurs when copying non-final entities during table rebuilds. Solutions:- Reduce rebuild frequency: Increase
GRAPH_STORE_HISTORY_SLACK_FACTOR - Use delete strategy: Lower
REBUILD_THRESHOLDto favor deletions - Smaller batches: Reduce
GRAPH_STORE_BATCH_TARGET_DURATIONfor shorter blocking periods
Storage not reclaimed after pruning
Cause: Delete strategy marks space as reusable but doesn’t return it to the OS. Solutions:Cannot graft from pruned deployment
Cause: Target graft block is before the deployment’searliest_block.
Solution:
- Increase
history_blocksbefore pruning if grafting is planned - Use a different source deployment that hasn’t been pruned
- Reindex the source deployment from scratch if necessary
Configuration Reference
Environment Variables
Graphman Commands
Additional Resources
- Graphman CLI - Complete command reference
- Maintenance Operations - Deployment management
- Monitoring - Track deployment performance
- Configuration - Graph Node configuration

