Skip to main content
This interface is extremely experimental. There is no guarantee that this interface will ever be brought to production use. It exists solely to evaluate the utility of direct SQL access.The interface is only available when the environment variable GRAPH_ENABLE_SQL_QUERIES is set to true.

Overview

The SQL interface allows issuing direct SQL queries against subgraph data by posting JSON requests to the /subgraphs/sql endpoint. The server responds with query results in JSON format.

Request Format

Post a JSON document to /subgraphs/sql with the following keys:
ParameterTypeRequiredDescription
deploymentstringYesIPFS hash of the deployment to query
querystringYesSQL query to execute
modestringYesEither info (metadata only) or data (full results)

Mode Options

  • info: Returns only metadata about the query execution (no actual data)
  • data: Returns the full query result set

Table and Column Naming

Entity Types

Table and attribute names for @entity types are snake-cased from their GraphQL schema form.
type SomeDailyStuff @entity {
  id: ID!
  value: BigInt!
  timestamp: Int!
}

Aggregation Types

For @aggregation types, use the function syntax: <aggregation>(<interval>)
type MyStats @aggregation(intervals: ["hour", "day"]) {
  id: Int8!
  total: BigInt!
}

SQL Capabilities

The interface supports fairly arbitrary SQL, including:
  • Aggregations (COUNT, SUM, AVG, etc.)
  • Most PostgreSQL built-in functions
  • Complex joins and subqueries
  • Window functions
  • CTEs (Common Table Expressions)
The broad SQL support makes it easy to issue queries that take a very long time. Use GRAPH_SQL_STATEMENT_TIMEOUT to set query timeouts.

Example Usage

Request

{
  "query": "select number, hash, parent_hash, timestamp from block order by number desc limit 2",
  "deployment": "QmSoMeThInG",
  "mode": "data"
}

Response

{
  "data": [
    {
      "hash": "\\x5f91e535ee4d328725b869dd96f4c42059e3f2728dfc452c32e5597b28ce68d6",
      "number": 5000,
      "parent_hash": "\\x82e95c1ee3a98cd0646225b5ae6afc0b0229367b992df97aeb669c898657a4bb",
      "timestamp": "2015-07-30T20:07:44+00:00"
    },
    {
      "hash": "\\x82e95c1ee3a98cd0646225b5ae6afc0b0229367b992df97aeb669c898657a4bb",
      "number": 4999,
      "parent_hash": "\\x875c9a0f8215258c3b17fd5af5127541121cca1f594515aae4fbe5a7fbef8389",
      "timestamp": "2015-07-30T20:07:36+00:00"
    }
  ]
}
Binary data uses PostgreSQL’s hex string notation (e.g., \x5f91e535...).

Current Limitations

Limitation: Bind variables and query parameters are not supported.Impact: All queries must use literal SQL values, which can be a security concern.Mitigation: Ensure proper input sanitization if exposing this interface.
Configuration: GRAPH_SQL_STATEMENT_TIMEOUT environment variableDefault: UnlimitedRecommendation: Always set a timeout in production-like environments to prevent resource exhaustion.
Current Behavior: Queries always execute at the subgraph head block.Future Enhancement: Could easily add a parameter to specify a historical block number for time-travel queries.
Current State: The interface exposes the raw SQL schema, including implementation details.Hidden Columns: System columns like vid and block_range are made inaccessible, but the schema structure is otherwise exposed.Consideration: This may expose more implementation details than desired for a production API.
Limitation: Cannot join across different subgraphs.Reason: Would require additional plumbing to hide sharding effects.Future Work: Adding cross-subgraph joins is possible but requires architectural work.
Current Format: JSONIssue: Inefficient for large result setsFuture Improvement: Consider alternative formats like:
  • CSV
  • Apache Arrow
  • Protocol Buffers
  • MessagePack

Security Considerations

While significant effort has been made to ensure this interface is safe (particularly preventing writes), there is no guarantee it works without bugs. Use with caution.

Safety Measures

  • Read-only: Interface is designed to prevent write operations
  • System column protection: Columns like vid and block_range are inaccessible
  • Schema isolation: Queries are scoped to the specified deployment

Risk Factors

  1. Query Complexity: Easy to issue expensive queries that consume significant resources
  2. Resource Exhaustion: Without proper timeouts, long-running queries can impact node performance
  3. Schema Exposure: Raw schema access may reveal implementation details
  4. No Query Validation: Arbitrary SQL means potential for unexpected behavior
# Enable the interface
GRAPH_ENABLE_SQL_QUERIES=true

# Set a reasonable timeout (in milliseconds)
GRAPH_SQL_STATEMENT_TIMEOUT=30000

Use Cases

Analytics and Exploration

  • Ad-hoc data analysis
  • Complex aggregations not easily expressed in GraphQL
  • Data export for external tools
  • Debugging and development

Advanced Queries

SELECT 
  address,
  balance,
  LAG(balance) OVER (PARTITION BY address ORDER BY timestamp) as previous_balance
FROM account_balance
ORDER BY timestamp DESC;

Future Improvements

  • Bind variable support
  • Historical block queries
  • Better response formats
  • Query result streaming
  • Query plan analysis
  • Cross-subgraph joins
  • Write operations (extremely careful consideration needed)
  • Query optimization hints
  • Result caching
  • Rate limiting and quotas

Build docs developers (and LLMs) love