Skip to main content

Overview

Graph Node supports syncing offchain data sources in subgraphs, such as IPFS files. The implementation provides reusable components and data structures that simplify adding new kinds of offchain data sources.
For subgraph developer documentation on using offchain data sources, refer to the official subgraph documentation. This page focuses on implementation details for Graph Node developers.

Implementation Architecture

Core Components

The offchain data source implementation consists of several reusable components designed to make adding new data source kinds straightforward.
  • Data Structures: graph crate, data_source/offchain.rs
  • Monitoring Logic: OffchainMonitor in subgraph/context.rs
  • Polling Helper: PollingMonitor generic component
  • IPFS Implementation: IpfsService (reference implementation)

Data Source Representation

Offchain data sources are represented by data structures in the graph crate at data_source/offchain.rs. These structures handle:
  • Parsing from the subgraph manifest
  • Creation as dynamic data sources
  • Source type enumeration via enum Source
  • Kind registration in const OFFCHAIN_KINDS
Adding a new file-based data source kind typically only requires:
  1. A new enum Source variant
  2. Adding the kind to const OFFCHAIN_KINDS

OffchainMonitor

The OffchainMonitor is responsible for tracking and fetching offchain data. Currently located in subgraph/context.rs.

Key Operations

Called when an offchain data source is created from a template. This function registers the source for monitoring.
fn add_source(/* parameters */)
Expectation: A background task will monitor the source for relevant events (e.g., file becoming available).
Called periodically by the subgraph runner to process events from monitored sources.
fn ready_offchain_events(/* parameters */)
For file data sources, the event is the file content becoming available.

Adding New Data Source Kinds

File-Based Data Sources

For file-based data sources, most existing code can be reused:
// In graph/data_source/offchain.rs
enum Source {
    Ipfs(IpfsSource),
    YourNewKind(YourSource), // Add new variant
}

const OFFCHAIN_KINDS: &[&str] = &[
    "file/ipfs",
    "file/yournewkind", // Add new kind
];

Using PollingMonitor

For data sources that rely on polling to check availability, use the generic PollingMonitor component:
  1. Implement the polling logic as a tower service
  2. The IpfsService serves as a reference implementation
  3. Focus only on the polling and fetching logic
  4. The PollingMonitor handles the monitoring infrastructure

Testing

Integration Testing

Automated testing for offchain data sources can be tricky and should be discussed case-by-case.
The file_data_sources test in runner_tests.rs serves as a starting point for writing integration tests with offchain data sources.
// In runner_tests.rs
#[tokio::test]
async fn file_data_sources() {
    // Test setup
    // 1. Create subgraph with file data source template
    // 2. Upload file to IPFS/storage
    // 3. Trigger data source creation
    // 4. Verify data was processed correctly
}

Current Limitations

Offchain data sources currently can only exist as dynamic data sources, instantiated from templates. They cannot be configured as static data sources in the manifest.Impact: All offchain data sources must be created at runtime from a template.
Some parts of the implementation assume offchain data sources are ‘one shot’ - only a single trigger is handled per data source instance.Works Well For: Files (file is found, handled, done)Consideration: More complex offchain data sources (e.g., continuous streams) will require additional planning and architectural changes.
Entities from offchain data sources do not currently influence the PoI. Causality region IDs are not deterministic.Impact:
  • Offchain data cannot be verified through PoI
  • May affect dispute resolution
  • Limits trustless verification guarantees

Reference Implementation: IPFS

The initially supported data source kind is file/ipfs, which serves as a reference implementation.
// Example structure (simplified)
struct IpfsService {
    client: IpfsClient,
    // ... other fields
}

impl Service<IpfsRequest> for IpfsService {
    type Response = FileContent;
    // Implementation details
}

Best Practices

For New Data Source Implementations

  1. Reuse existing components: Start with PollingMonitor for polling-based sources
  2. Study IPFS implementation: Use IpfsService as a template
  3. Consider timing: Plan for async availability and delays
  4. Test thoroughly: Write integration tests early in development
  5. Document limitations: Be clear about one-shot vs. continuous behavior

Architecture Considerations

When adding support for non-file data sources (e.g., APIs, message queues), consider:
  • Event multiplicity (multiple triggers vs. one-shot)
  • Determinism requirements for PoI
  • Resource management and cleanup
  • Error handling and retry logic

Build docs developers (and LLMs) love