Overview
Graph Node supports syncing offchain data sources in subgraphs, such as IPFS files. The implementation provides reusable components and data structures that simplify adding new kinds of offchain data sources.For subgraph developer documentation on using offchain data sources, refer to the official subgraph documentation. This page focuses on implementation details for Graph Node developers.
Implementation Architecture
Core Components
The offchain data source implementation consists of several reusable components designed to make adding new data source kinds straightforward.Component Locations
Component Locations
- Data Structures:
graphcrate,data_source/offchain.rs - Monitoring Logic:
OffchainMonitorinsubgraph/context.rs - Polling Helper:
PollingMonitorgeneric component - IPFS Implementation:
IpfsService(reference implementation)
Data Source Representation
Offchain data sources are represented by data structures in thegraph crate at data_source/offchain.rs. These structures handle:
- Parsing from the subgraph manifest
- Creation as dynamic data sources
- Source type enumeration via
enum Source - Kind registration in
const OFFCHAIN_KINDS
Adding a new file-based data source kind typically only requires:
- A new
enum Sourcevariant - Adding the kind to
const OFFCHAIN_KINDS
OffchainMonitor
TheOffchainMonitor is responsible for tracking and fetching offchain data. Currently located in subgraph/context.rs.
Key Operations
fn add_source
fn add_source
Called when an offchain data source is created from a template. This function registers the source for monitoring.Expectation: A background task will monitor the source for relevant events (e.g., file becoming available).
fn ready_offchain_events
fn ready_offchain_events
Called periodically by the subgraph runner to process events from monitored sources.For file data sources, the event is the file content becoming available.
Adding New Data Source Kinds
File-Based Data Sources
For file-based data sources, most existing code can be reused:Using PollingMonitor
For data sources that rely on polling to check availability, use the genericPollingMonitor component:
PollingMonitor Implementation
PollingMonitor Implementation
- Implement the polling logic as a
towerservice - The
IpfsServiceserves as a reference implementation - Focus only on the polling and fetching logic
- The
PollingMonitorhandles the monitoring infrastructure
Testing
Integration Testing
Automated testing for offchain data sources can be tricky and should be discussed case-by-case.The
file_data_sources test in runner_tests.rs serves as a starting point for writing integration tests with offchain data sources.Current Limitations
Dynamic Only
Dynamic Only
Offchain data sources currently can only exist as dynamic data sources, instantiated from templates. They cannot be configured as static data sources in the manifest.Impact: All offchain data sources must be created at runtime from a template.
One-Shot Assumption
One-Shot Assumption
Some parts of the implementation assume offchain data sources are ‘one shot’ - only a single trigger is handled per data source instance.Works Well For: Files (file is found, handled, done)Consideration: More complex offchain data sources (e.g., continuous streams) will require additional planning and architectural changes.
Proof of Indexing (PoI)
Proof of Indexing (PoI)
Entities from offchain data sources do not currently influence the PoI. Causality region IDs are not deterministic.Impact:
- Offchain data cannot be verified through PoI
- May affect dispute resolution
- Limits trustless verification guarantees
Reference Implementation: IPFS
The initially supported data source kind isfile/ipfs, which serves as a reference implementation.
Best Practices
For New Data Source Implementations
- Reuse existing components: Start with
PollingMonitorfor polling-based sources - Study IPFS implementation: Use
IpfsServiceas a template - Consider timing: Plan for async availability and delays
- Test thoroughly: Write integration tests early in development
- Document limitations: Be clear about one-shot vs. continuous behavior
Architecture Considerations
When adding support for non-file data sources (e.g., APIs, message queues), consider:
- Event multiplicity (multiple triggers vs. one-shot)
- Determinism requirements for PoI
- Resource management and cleanup
- Error handling and retry logic

