Skip to main content
When a graph-node installation grows beyond what a single Postgres instance can handle, you can scale the system horizontally by adding more Postgres instances. This is called sharding, and each Postgres instance is called a shard.

Overview

The resulting graph-node system uses all Postgres instances together, essentially forming a distributed database. Sharding relies heavily on the fact that in almost all cases:
  • Traffic for a single subgraph can be handled by a single Postgres instance
  • Load can be distributed by storing different subgraphs in different shards
Sharding requires using a configuration file rather than environment variables. It cannot be configured through CLI options alone.

The Primary Shard

In a sharded setup, one shard is special and is called the primary. The primary is used to store:
  • System-wide metadata
  • Mapping of subgraph names to IPFS hashes
  • Directory of all subgraphs and the shards in which each is stored
  • List of configured chains
  • Metadata that rarely changes
Frequently changing metadata (such as subgraph head pointers) is stored in the individual shards, not the primary.

Setting Up Sharding

Configuration File Setup

Add additional [store.<name>] entries to your configuration file:
[store]
[store.primary]
connection = "postgresql://graph:password@primary-host:5432/graph"
pool_size = 10

[store.shard1]
connection = "postgresql://graph:password@shard1-host:5432/graph"
pool_size = 10

[store.shard2]
connection = "postgresql://graph:password@shard2-host:5432/graph"
pool_size = 10
See Multiple Databases for detailed configuration options.

Prerequisites: Inter-Shard Communication

Sharding uses the postgres_fdw extension for inter-shard communication. Graph Node automatically sets up foreign servers and foreign tables.
Before setting up sharding, ensure:
  1. All shards can communicate with each other over the network
  2. Firewall rules allow traffic from each shard to every other shard
  3. Authentication (pg_hba.conf) allows connections from all other shards
  4. Connection strings are in the format postgres://USER:PASSWORD@HOST[:PORT]/DB (Graph Node must parse these components)

Initialization

When a new shard is added to the configuration file:
  1. Graph Node initializes the database schema during startup
  2. Foreign data wrappers are automatically configured
  3. Metadata synchronization begins

Verifying Inter-Shard Connectivity

After schema initialization, manually verify connectivity:
-- Run on any shard to verify connection to primary
SELECT count(*) FROM primary_public.chains;

-- Run on primary to verify connection to a shard
SELECT count(*) FROM shard_shard1_subgraphs.subgraph;
The query result doesn’t matter - success means connectivity works. Failures indicate network or authentication issues.

Metadata Synchronization

With multiple shards, Graph Node periodically copies metadata from the primary to all other shards:
  • Copied metadata is needed to respond to queries
  • Each query needs the primary to find which shard stores the subgraph’s data
  • Metadata copies enable queries when the primary is down

Sharding Strategies

There are many ways to split data across shards. One recommended setup:

Small Primary

Minimal primary shard storing only system metadata

Low-Traffic Shards

Multiple shards for low-traffic subgraphs with many subgraphs per shard

High-Traffic Shards

One or few shards for high-traffic subgraphs with few subgraphs per shard

Block Cache Shards

Dedicated shards storing only blockchain block caches

Strategy: By Traffic Level

[deployment]
# High-traffic VIP subgraphs
[[deployment.rule]]
match = { name = "(vip|production)/.*" }
shard = "vip"
indexers = [ "index_node_vip_0" ]

# Medium-traffic subgraphs
[[deployment.rule]]
match = { name = "medium/.*" }
shard = "shard1"
indexers = [ "index_node_1" ]

# Low-traffic community subgraphs
[[deployment.rule]]
shards = [ "shard2", "shard3" ]  # Auto-distribute
indexers = [ "index_node_community_0", "index_node_community_1" ]

Strategy: By Blockchain Network

Store different networks in different shards:
[chains.mainnet]
shard = "ethereum_shard"
provider = [ { label = "mainnet", url = "http://eth:8545", features = [] } ]

[chains.polygon]
shard = "polygon_shard"
provider = [ { label = "polygon", url = "http://polygon:8545", features = [] } ]

[chains.arbitrum]
shard = "arbitrum_shard"
provider = [ { label = "arbitrum", url = "http://arbitrum:8545", features = [] } ]

[deployment]
[[deployment.rule]]
match = { network = "mainnet" }
shard = "ethereum_shard"
indexers = [ "index_node_eth_0" ]

[[deployment.rule]]
match = { network = "polygon" }
shard = "polygon_shard"
indexers = [ "index_node_polygon_0" ]

[[deployment.rule]]
match = { network = "arbitrum" }
shard = "arbitrum_shard"
indexers = [ "index_node_arbitrum_0" ]

Strategy: Dedicated Block Cache Shards

Separate block cache storage from subgraph data:
[store.primary]
connection = "postgresql://graph:${PGPASSWORD}@primary:5432/graph"
pool_size = 10

[store.blocks]
connection = "postgresql://graph:${PGPASSWORD}@blocks-db:5432/graph"
pool_size = 20

[store.subgraphs]
connection = "postgresql://graph:${PGPASSWORD}@subgraphs-db:5432/graph"
pool_size = 30

[chains.mainnet]
shard = "blocks"  # Block cache in dedicated shard
provider = [ { label = "mainnet", url = "http://eth:8545", features = [] } ]

[deployment]
[[deployment.rule]]
shard = "subgraphs"  # Subgraph data in separate shard
indexers = [ "index_node_0" ]

Copying and Moving Between Shards

Besides deployment rules for new subgraphs, you can copy and move existing subgraphs between shards.

Creating a Copy

Start copying a subgraph from one shard to another:
graphman copy create <source-deployment> <dest-shard>
A deployment is identified by its IPFS hash. Multiple copies of the same deployment can exist across shards, but only one copy per shard. Only one copy is marked as active for query responses.

Copy Behavior

By default, graphman copy create:
  1. Copies source subgraph data up to the copy initiation point
  2. Starts indexing independently from the source
  3. Both source and destination continue indexing

Copy with Activation

Automatically activate the copy when it catches up:
graphman copy create --activate <source-deployment> <dest-shard>
This:
  1. Copies data from source
  2. Indexes independently until caught up to chain head
  3. Marks copy as active when synced
  4. Routes queries to the new copy

Copy with Replacement

Replace the source with the copy:
graphman copy create --replace <source-deployment> <dest-shard>
This:
  1. Copies data from source
  2. Indexes independently until caught up
  3. Marks copy as active
  4. Marks source as unused
  5. Source is deleted ~8 hours after copy syncs (default reaper configuration)
Copying large deployments can take a very long time (hours to days depending on size). Monitor progress regularly.

Monitoring Copy Progress

Check copy operation status:
graphman copy stats <deployment-id>
This shows:
  • Copy phase (copying data, building indexes, counting entities)
  • Progress percentage
  • Estimated completion time

Listing Copy Operations

graphman copy list
Shows all active and pending copy operations.
The number of active copy operations is limited to 5 per source/destination shard pair to limit load on the shards.

Copy Process Details

During copying:
  1. Namespace Creation: Creates temporary sgdNNN namespace in destination matching source’s identifier
  2. Table Mapping: Maps all tables from source into destination shard via foreign data wrapper
  3. Data Transfer: Copies data in batches
  4. Index Building: Rebuilds indexes on destination (can be slow)
  5. Entity Counting: Counts all entities to update entity_count (can be very slow with minimal output)
  6. Cleanup: Automatically deletes temporary namespace when complete

Deleting Inactive Copies

Make non-active copies eligible for deletion:
  1. Unassign the copy from all indexing nodes
  2. The unused deployment reaper will delete it automatically (~8 hours by default)

Namespaces

Sharding creates Postgres schemas (namespaces) for inter-shard data access:
  • primary_public: Maps important tables from primary into each shard
  • shard_<name>_subgraphs: Maps important tables from each shard into every other shard
These enable queries and operations across shards.

Rebuilding Mappings

If foreign data wrapper mappings become corrupted or out of sync:
graphman database remap
This recreates all foreign server definitions and table mappings.
The mapping setup code is in ForeignServer::map_primary and ForeignServer::map_metadata in the connection_pool.rs file.

Best Practices

Begin with a single database. Add sharding only when you hit resource limits. The original database becomes the primary shard, and existing data can remain there.
Ensure all shards can communicate before configuring sharding. Test network connectivity and authentication between all shard pairs.
Set up deployment rules to automatically route new subgraphs to appropriate shards. This is simpler than manually moving subgraphs later.
When copying subgraphs between shards, actively monitor progress. Copy operations can fail silently or take much longer than expected.
Consider storing blockchain block caches in dedicated shards separate from subgraph data. This isolates block ingestion load.
Use the shards array (instead of single shard) in deployment rules to automatically distribute subgraphs to the least-loaded shard.
The primary shard should primarily store metadata. Route most subgraph data to other shards.
Test sharding setup in a non-production environment first. Verify inter-shard connectivity and deployment placement.

Removing a Shard

When a shard is no longer needed:
  1. Ensure no references: Verify no deployment is stored in the shard and no chain is stored in it
  2. Remove from config: Delete the shard declaration from the configuration file
  3. Restart nodes: Restart all Graph Node instances with the updated configuration
Removing a shard leaves behind:
  • Foreign tables in shard_<name>_subgraphs on other shards
  • User mapping and foreign server definitions
These don’t hamper operation but can be manually removed via DROP commands in psql if desired.

Troubleshooting

If a copy operation appears stuck:
# Check copy status
graphman copy stats <deployment-id>

# Check graph-node logs for errors
tail -f /var/log/graph-node.log | grep -i copy

# Check database activity
# Connect to destination shard and check active queries
SELECT pid, state, query FROM pg_stat_activity WHERE query LIKE '%sgd%';
Long pauses during entity counting or index building are normal.
If you see FDW-related errors:
# Rebuild all foreign data wrapper mappings
graphman database remap
Check network connectivity:
# From primary shard
psql "postgresql://graph:password@shard1-host:5432/graph" -c "SELECT 1"
Verify pg_hba.conf allows connections:
# Add to pg_hba.conf on each shard
host    graph    graph    <other-shard-ip>/32    md5
Check firewall rules:
# Test TCP connection from one shard to another
nc -zv <shard-host> 5432
Test deployment placement:
graphman --config config.toml config place <subgraph-name> <network>
Review your deployment rules - rules are evaluated in order, first match wins.

Complete Example

# Store configuration
[store]
# Primary: metadata only
[store.primary]
connection = "postgresql://graph:${PGPASSWORD}@primary-db.internal:5432/graph"
pool_size = 10

# VIP shard: high-traffic production subgraphs
[store.vip]
connection = "postgresql://graph:${PGPASSWORD}@vip-db.internal:5432/graph"
pool_size = 50

[store.vip.replicas.repl1]
connection = "postgresql://graph:${PGPASSWORD}@vip-repl1.internal:5432/graph"
weight = 1

# Community shard 1: many low-traffic subgraphs
[store.community1]
connection = "postgresql://graph:${PGPASSWORD}@community1-db.internal:5432/graph"
pool_size = 25

# Community shard 2: many low-traffic subgraphs
[store.community2]
connection = "postgresql://graph:${PGPASSWORD}@community2-db.internal:5432/graph"
pool_size = 25

# Blocks shard: blockchain block caches
[store.blocks]
connection = "postgresql://graph:${PGPASSWORD}@blocks-db.internal:5432/graph"
pool_size = 20

# Chain configuration
[chains]
ingestor = "block_ingestor_node"

[chains.mainnet]
shard = "blocks"  # Block cache in dedicated shard
amp = "ethereum-mainnet"
provider = [
  { label = "mainnet-archive", url = "http://eth-archive:8545", features = ["archive", "traces"] },
  { label = "mainnet-regular", url = "http://eth-regular:8545", features = [] }
]

[chains.polygon]
shard = "blocks"
amp = "polygon-mainnet"
provider = [
  { label = "polygon", url = "http://polygon:8545", features = [] }
]

[chains.arbitrum]
shard = "blocks"
amp = "arbitrum-one"
provider = [
  { label = "arbitrum", url = "http://arbitrum:8545", features = [] }
]

# Deployment rules
[deployment]

# High-traffic VIP subgraphs -> VIP shard
[[deployment.rule]]
match = { name = "(acme-corp|vip|production)/.*" }
shard = "vip"
indexers = [ "index_node_vip_0", "index_node_vip_1" ]

# All other subgraphs -> community shards (auto-distribute)
[[deployment.rule]]
shards = [ "community1", "community2" ]  # System picks least-loaded
indexers = [
  "index_node_community_0",
  "index_node_community_1",
  "index_node_community_2",
  "index_node_community_3"
]
Migration Steps:
# 1. Start with configuration above
# 2. Restart all graph-node instances
# 3. New subgraphs automatically route to correct shards

# 4. Optionally move existing high-traffic subgraph to VIP shard
graphman copy create --replace <deployment-ipfs-hash> vip

# 5. Monitor copy progress
graphman copy stats <deployment-ipfs-hash>

# 6. Verify deployment location
graphman info <deployment-ipfs-hash>

Build docs developers (and LLMs) love