Schema Generation

Overview

Graph Node converts GraphQL schemas into PostgreSQL table structures following a systematic set of rules. Understanding this transformation is essential for optimizing subgraph performance and troubleshooting storage issues.

Core Schema Generation Rules

Namespace Isolation

Each subgraph’s data is stored in a dedicated PostgreSQL namespace named sgdNNNN. The mapping between namespace names and deployment IDs is maintained in the deployment_schemas table.

Entity Type Mapping

Each entity type in the GraphQL schema becomes a table with a structure that mirrors the type’s field declarations.

Enum Types

Enums in the GraphQL schema are created as PostgreSQL enum types and used directly in table definitions.

Interface Handling

Interfaces are not stored in the database. Only concrete types that implement interfaces are stored as tables.

Table Structure

Mutable Entities (Standard @entity)

type Account @entity {
  id: ID!
  owner: Bytes!
  balance: BigInt!
  created: Int!
}

System Columns

vid: Internal version identifier used to uniquely identify specific entity versions
block_range: Enables time-travel queries by tracking which blocks each entity version is valid for

The block_range column enables querying historical state at any block height. See the time-travel queries documentation for details.

Immutable Entities

Entities declared with @entity(immutable: true) use a simpler structure:

type Transfer @entity(immutable: true) {
  id: ID!
  from: Bytes!
  to: Bytes!
  value: BigInt!
}

Optimizations for Immutable Entities

Block Range Simplification: Since immutable entities never change, the upper bound of block_range is always infinite. Instead of storing this, we use:

block$: Single integer column (not a range)
Visibility check: Simplified to block$ <= B instead of range containment

Unique Constraint: Since each entity has only one version, we add UNIQUE(id)Index Type: Can use simple BTree indexes instead of expensive GiST indexes

Timeseries Entities

Timeseries entities use the same structure as immutable entities:

type PricePoint @entity(timeseries: true) {
  id: Int8!
  timestamp: Timestamp!
  price: BigDecimal!
  volume: BigInt!
}

The only difference from immutable entities is that timeseries entities must have a timestamp attribute.

Aggregation Entities

Aggregations are represented by multiple tables, one per interval:

type TradeStats @aggregation(intervals: ["hour", "day"]) {
  id: Int8!
  timestamp: Timestamp!
  volume: BigInt! @aggregate(fn: "sum", arg: "volume")
  trades: Int8! @aggregate(fn: "count")
}

Aggregation tables are never updated, only appended to. They do not support mutable entities.

Type Mapping

ID Column Types

Supported ID Types

ID: Stored as text (alias for String for historical reasons)
String: Stored as text
Bytes: Stored as bytea

Primitive Types

GraphQL Type    → PostgreSQL Type
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
String          → text
Int             → int4
Int8            → int8
BigInt          → numeric
BigDecimal      → numeric
Bytes           → bytea
Boolean         → boolean
Timestamp       → timestamp

Reference Types

Entity References

When an attribute references another entity, the column type matches the referenced entity’s id type.No Foreign Keys: Graph Node does not use foreign key constraints. This allows:

Storing entities that reference entities created later
Avoiding constraint checking overhead during indexing

Trade-offs:

Foreign key violations only detected at query time
References may return no results if target entity doesn’t exist

type Account @entity {
  id: ID!
  owner: Bytes!
}

type Token @entity {
  id: ID!
  holder: Account!  # References Account.id
}

Enum Types

enum Status {
  PENDING
  ACTIVE
  COMPLETED
}

type Order @entity {
  id: ID!
  status: Status!
}

Array Types

List types in GraphQL become PostgreSQL array types:

type Entity @entity {
  id: ID!
  tags: [String!]!
  values: [BigInt!]!
}

Nested arrays like [[String]] are not allowed in GraphQL schemas. Arrays only contain primitive types or entity references.

Indexing Strategy

Graph Node builds indexes extensively due to not knowing query patterns ahead of time. This leads to significant overindexing, but both reducing overindexing and supporting custom indexes are open development areas.

Indexes for Mutable Entities

Exclusion Index

Purpose: Ensures that different versions of the same entity have disjoint block ranges.

CREATE EXCLUDE INDEX ON sgd42.entity 
USING gist (id WITH =, block_range WITH &&)

This prevents overlapping versions of the same entity ID.

BRIN Index

Purpose: Speeds up operations with good data locality (e.g., reversion).

CREATE INDEX ON sgd42.entity 
USING brin (
    lower(block_range), 
    COALESCE(upper(block_range), 2147483647), 
    vid
)

Particularly effective for tables where entities are never updated or deleted.

Indexes for Immutable/Timeseries Entities

CREATE UNIQUE INDEX ON sgd42.transfer (id)

Attribute Indexes

Graph Node creates indexes for each attribute with a naming convention:

Index Naming: attr_N_M

N: Number of the entity type in the GraphQL schema
M: Number of the attribute within that type

Example: attr_3_1 for the second attribute of the fourth entity type (zero-indexed for attributes, but entity types start from 0)

-- BTree index for primitive attributes
CREATE INDEX attr_2_3 ON sgd42.account 
USING btree (balance)

String Attribute Indexes

Large string attributes require special handling:

String Prefix Indexing

Problem: PostgreSQL limits individual index entry sizes.Solution: Index only the prefix of string attributes.

CREATE INDEX attr_1_2 ON sgd42.entity 
USING btree (left(name, STRING_PREFIX_SIZE))

Query Generation: Automatically adds prefix clauses to queries:

-- User query looking for name = 'Hamming'
SELECT * FROM entity 
WHERE left(name, STRING_PREFIX_SIZE) = 'Hamming'
  AND name = 'Hamming'

The prefix clause allows index usage while the full comparison ensures correctness.

Known Issues and Limitations

Array Performance

Issue: Storing arrays as PostgreSQL array attributes can have catastrophically bad performance.Condition: Arrays with unbounded or large sizes.Impact: Slow queries and storage bloat.Mitigation: Keep array sizes small and bounded.

Overindexing

Issue: Extensive indexing leads to:

Large storage requirements for indexes
Slower write operations

Status: Open issue - reducing overindexing is an active development area.Mitigation: Manual index optimization (see Custom Indexes below).

Index Usability for Sorting

Issue: BTree indexes on single attributes aren’t always usable for sorting.Reason: Graph Node adds id to all ORDER BY clauses for deterministic ordering.

-- User query
ORDER BY name

-- Actual SQL
ORDER BY name, id

A single-column index on name cannot efficiently support this multi-column sort.Future: PostgreSQL 13’s incremental sorting may help.

Custom Indexes

Issue: No built-in support for custom indexes makes it hard to transfer manually created indexes between subgraph versions.Convention: Manually created indexes should start with manual_ prefix.

CREATE INDEX manual_account_owner_balance 
ON sgd42.account (owner, balance)

Status: Custom index support is an open development area.

Best Practices

Schema Design

# Use immutable for entities that never change
type Transfer @entity(immutable: true) {
  id: ID!
  from: Bytes!
  to: Bytes!
  value: BigInt!
}

# Benefits:
# - Simpler storage structure
# - More efficient indexes
# - Better query performance

Performance Optimization

Use immutable entities for data that never changes
Avoid large arrays in entity fields
Consider manual indexes for common query patterns (prefix with manual_)
Monitor index usage and remove unused indexes
Use appropriate ID types (Bytes for addresses, String for human-readable IDs)

Contributing

Implementation Details

Extending Graph Node

Overview

Core Schema Generation Rules

Table Structure

Mutable Entities (Standard @entity)

Immutable Entities

Timeseries Entities

Aggregation Entities

Type Mapping

ID Column Types

Primitive Types

Reference Types

Enum Types

Array Types

Indexing Strategy

Indexes for Mutable Entities

Indexes for Immutable/Timeseries Entities

Attribute Indexes

String Attribute Indexes

Known Issues and Limitations

Best Practices

Schema Design

Performance Optimization

Build docs developers (and LLMs) love

Contributing

Implementation Details

Extending Graph Node

​Overview

​Core Schema Generation Rules

​Table Structure

​Mutable Entities (Standard @entity)

​Immutable Entities

​Timeseries Entities

​Aggregation Entities

​Type Mapping

​ID Column Types

​Primitive Types

​Reference Types

​Enum Types

​Array Types

​Indexing Strategy

​Indexes for Mutable Entities

​Indexes for Immutable/Timeseries Entities

​Attribute Indexes

​String Attribute Indexes

​Known Issues and Limitations

​Best Practices

​Schema Design

​Performance Optimization

Build docs developers (and LLMs) love

Overview

Core Schema Generation Rules

Table Structure

Mutable Entities (Standard @entity)

Immutable Entities

Timeseries Entities

Aggregation Entities

Type Mapping

ID Column Types

Primitive Types

Reference Types

Enum Types

Array Types

Indexing Strategy

Indexes for Mutable Entities

Indexes for Immutable/Timeseries Entities

Attribute Indexes

String Attribute Indexes

Known Issues and Limitations

Best Practices

Schema Design

Performance Optimization