Offchain Data Sources

Overview

Graph Node supports syncing offchain data sources in subgraphs, such as IPFS files. The implementation provides reusable components and data structures that simplify adding new kinds of offchain data sources.

For subgraph developer documentation on using offchain data sources, refer to the official subgraph documentation. This page focuses on implementation details for Graph Node developers.

Implementation Architecture

Core Components

The offchain data source implementation consists of several reusable components designed to make adding new data source kinds straightforward.

Component Locations

Data Structures: graph crate, data_source/offchain.rs
Monitoring Logic: OffchainMonitor in subgraph/context.rs
Polling Helper: PollingMonitor generic component
IPFS Implementation: IpfsService (reference implementation)

Data Source Representation

Offchain data sources are represented by data structures in the graph crate at data_source/offchain.rs. These structures handle:

Parsing from the subgraph manifest
Creation as dynamic data sources
Source type enumeration via enum Source
Kind registration in const OFFCHAIN_KINDS

Adding a new file-based data source kind typically only requires:

A new enum Source variant
Adding the kind to const OFFCHAIN_KINDS

OffchainMonitor

The OffchainMonitor is responsible for tracking and fetching offchain data. Currently located in subgraph/context.rs.

Key Operations

fn add_source

Called when an offchain data source is created from a template. This function registers the source for monitoring.

fn add_source(/* parameters */)

Expectation: A background task will monitor the source for relevant events (e.g., file becoming available).

fn ready_offchain_events

Called periodically by the subgraph runner to process events from monitored sources.

fn ready_offchain_events(/* parameters */)

For file data sources, the event is the file content becoming available.

Adding New Data Source Kinds

File-Based Data Sources

For file-based data sources, most existing code can be reused:

// In graph/data_source/offchain.rs
enum Source {
    Ipfs(IpfsSource),
    YourNewKind(YourSource), // Add new variant
}

const OFFCHAIN_KINDS: &[&str] = &[
    "file/ipfs",
    "file/yournewkind", // Add new kind
];

Using PollingMonitor

For data sources that rely on polling to check availability, use the generic PollingMonitor component:

PollingMonitor Implementation

Implement the polling logic as a tower service
The IpfsService serves as a reference implementation
Focus only on the polling and fetching logic
The PollingMonitor handles the monitoring infrastructure

Testing

Integration Testing

Automated testing for offchain data sources can be tricky and should be discussed case-by-case.

The file_data_sources test in runner_tests.rs serves as a starting point for writing integration tests with offchain data sources.

// In runner_tests.rs
#[tokio::test]
async fn file_data_sources() {
    // Test setup
    // 1. Create subgraph with file data source template
    // 2. Upload file to IPFS/storage
    // 3. Trigger data source creation
    // 4. Verify data was processed correctly
}

Current Limitations

Dynamic Only

Offchain data sources currently can only exist as dynamic data sources, instantiated from templates. They cannot be configured as static data sources in the manifest.Impact: All offchain data sources must be created at runtime from a template.

One-Shot Assumption

Some parts of the implementation assume offchain data sources are ‘one shot’ - only a single trigger is handled per data source instance.Works Well For: Files (file is found, handled, done)Consideration: More complex offchain data sources (e.g., continuous streams) will require additional planning and architectural changes.

Proof of Indexing (PoI)

Entities from offchain data sources do not currently influence the PoI. Causality region IDs are not deterministic.Impact:

Offchain data cannot be verified through PoI
May affect dispute resolution
Limits trustless verification guarantees

Reference Implementation: IPFS

The initially supported data source kind is file/ipfs, which serves as a reference implementation.

// Example structure (simplified)
struct IpfsService {
    client: IpfsClient,
    // ... other fields
}

impl Service<IpfsRequest> for IpfsService {
    type Response = FileContent;
    // Implementation details
}

Best Practices

For New Data Source Implementations

Reuse existing components: Start with PollingMonitor for polling-based sources
Study IPFS implementation: Use IpfsService as a template
Consider timing: Plan for async availability and delays
Test thoroughly: Write integration tests early in development
Document limitations: Be clear about one-shot vs. continuous behavior

Architecture Considerations

When adding support for non-file data sources (e.g., APIs, message queues), consider:

Event multiplicity (multiple triggers vs. one-shot)
Determinism requirements for PoI
Resource management and cleanup
Error handling and retry logic

Contributing

Implementation Details

Extending Graph Node

Offchain Data Sources

Overview

Implementation Architecture

Core Components

Data Source Representation

OffchainMonitor

Key Operations

Adding New Data Source Kinds

File-Based Data Sources

Using PollingMonitor

Testing

Integration Testing

Current Limitations

Reference Implementation: IPFS

Best Practices

For New Data Source Implementations

Architecture Considerations

Build docs developers (and LLMs) love

Contributing

Implementation Details

Extending Graph Node

​Overview

​Implementation Architecture

​Core Components

​Data Source Representation

​OffchainMonitor

​Key Operations

​Adding New Data Source Kinds

​File-Based Data Sources

​Using PollingMonitor

​Testing

​Integration Testing

​Current Limitations

​Reference Implementation: IPFS

​Best Practices

​For New Data Source Implementations

​Architecture Considerations

Build docs developers (and LLMs) love

Overview

Implementation Architecture

Core Components

Data Source Representation

OffchainMonitor

Key Operations

Adding New Data Source Kinds

File-Based Data Sources

Using PollingMonitor

Testing

Integration Testing

Current Limitations

Reference Implementation: IPFS

Best Practices

For New Data Source Implementations

Architecture Considerations