BigQuery Public Dataset

This document provides an overview of the BigQuery public dataset that allows users to query historical on-chain data from the NEAR Protocol. It includes setup instructions, example queries, and information about the available data structures.

NEAR Public Lakehouse

Blockchain data indexing in NEAR Public Lakehouse is for anyone wanting to understand blockchain data. This includes:

Users

Create queries to track NEAR assets, monitor transactions, or analyze on-chain events at a massive scale.

Researchers

Use indexed data for data science tasks, including on-chain activities, identifying trends, or feeding AI/ML pipelines for predictive analysis.

Startups

Use NEAR’s indexed data for deep insights on user engagement, smart contract utilization, or insights across tokens and NFT adoption.

Key Benefits

Instant Insights

Historical on-chain data queried at scale. No infrastructure needed.

Cost-Effective

Eliminate the need to store and process bulk NEAR protocol data. Query as little or as much as you need.

Easy to Use

No prior blockchain experience required. Just bring SQL knowledge to unlock insights.

Getting Started

Open NEAR Dataset

Open the NEAR Protocol BigQuery Public Dataset.

View Dataset

Click the VIEW DATASET button to access the tables.

Create Query

Click the + button to create a new tab and write your query.

Run Query

Click the RUN button and check the Query results below the query window.

The NEAR Public Lakehouse repository contains the source code for ingesting NEAR Protocol data stored as JSON files in AWS S3 by NEAR Lake Indexer.

Example Queries

Daily Unique Signers and Accounts

How many unique signers and accounts have interacted with my smart contract per day?

SELECT
  ra.block_date collected_for_day,
  COUNT(DISTINCT t.signer_account_id) as total_signers,
  COUNT(DISTINCT ra.receipt_predecessor_account_id) as total_accounts
FROM `bigquery-public-data.crypto_near_mainnet_us.receipt_actions` ra
  JOIN `bigquery-public-data.crypto_near_mainnet_us.receipt_origin_transaction` ro 
    ON ro.receipt_id = ra.receipt_id
  JOIN `bigquery-public-data.crypto_near_mainnet_us.transactions` t 
    ON ro.originated_from_transaction_hash = t.transaction_hash
WHERE ra.action_kind = 'FUNCTION_CALL'
  AND ra.receipt_receiver_account_id = 'social.near' -- change to your contract
GROUP BY 1
ORDER BY 1 DESC;

Top Token Holders

SELECT
  receiver_account_id,
  SUM(CAST(args_json->>'amount' AS NUMERIC)) as total_balance
FROM `bigquery-public-data.crypto_near_mainnet_us.receipt_actions`
WHERE receipt_receiver_account_id = 'token.near' -- change to your token contract
  AND action_kind = 'FUNCTION_CALL'
  AND method_name = 'ft_transfer'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100;

Transaction Volume by Date

SELECT
  block_date,
  COUNT(*) as transaction_count
FROM `bigquery-public-data.crypto_near_mainnet_us.transactions`
WHERE block_date >= '2024-01-01'
GROUP BY 1
ORDER BY 1 DESC;

Always specify block_date in your WHERE clause to limit the data scanned and reduce costs.

How Much It Costs?

Storage Costs
Query Costs

NEAR pays for the storage and doesn’t charge you to use the public dataset.To learn more about BigQuery public datasets check this page.

You can check how much data will be scanned before running a query in the BigQuery console UI. Since BigQuery uses a columnar data structure and partitions, it’s recommended to select only the columns and partitions (block_date) needed to avoid unnecessary query costs.

BigQuery query cost estimation in the console UI

Cost Optimization Tips

Use partition filters

Always filter by block_date to reduce the amount of data scanned:

WHERE block_date >= '2024-01-01' AND block_date < '2024-02-01'

Select specific columns

Only select the columns you need instead of using SELECT *:

SELECT block_date, transaction_hash, signer_account_id
FROM `bigquery-public-data.crypto_near_mainnet_us.transactions`

Use LIMIT for exploration

When exploring data, use LIMIT to reduce scanned data:

SELECT * FROM table LIMIT 100

Preview before running

Use the query validator to see estimated costs before running expensive queries.

Architecture

The data is loaded in a streaming fashion using Databricks Autoloader into raw/bronze tables, and transformed with Databricks Delta Live Tables streaming jobs into cleaned/enriched/silver tables. The silver tables are also copied into the GCP BigQuery Public Dataset. NEAR Public Lakehouse Architecture diagram

Learn more about the Databricks Medallion Architecture.

Available Data

The current data that NEAR is providing was inspired by NEAR Indexer for Explorer.

NEAR plans to improve the data available in the NEAR Public Lakehouse making it easier to consume by denormalizing some tables.

Tables Overview

blocks

A structure that represents an entire block in the NEAR blockchain.Block is the main entity in NEAR Protocol blockchain. Blocks are produced in NEAR Protocol every second.Key columns: block_hash, block_height, block_timestamp, prev_block_hash, author_account_id

chunks

A structure that represents a chunk in the NEAR blockchain.Chunk of a Block is a part of a Block from a Shard. The collection of Chunks of the Block forms the NEAR Protocol Block. Chunk contains all the structures that make the Block: Transactions, Receipts, and Chunk Header.Key columns: chunk_hash, shard_id, signature, gas_limit, gas_used

transactions

Transaction is the main way of interaction between a user and a blockchain.Transaction contains: Signer account ID, Receiver account ID, and Actions.Key columns: transaction_hash, signer_account_id, receiver_account_id, status, block_timestamp

execution_outcomes

Execution outcome is the result of execution of Transaction or Receipt.In the result of the Transaction execution will always be a Receipt.Key columns: receipt_id, executed_in_block_hash, status, gas_burnt, tokens_burnt

receipt_details

All cross-contract communication in NEAR happens through Receipts.Receipts are stateful in a sense that they serve not only as messages between accounts but also can be stored in the account storage to await DataReceipts. Each receipt has a predecessor_id (who sent it) and receiver_id (the current account).Key columns: receipt_id, predecessor_account_id, receiver_account_id, receipt_kind

receipt_origin

Tracks the transaction that originated the receipt.Links receipts back to their originating transaction for tracing transaction flows.Key columns: receipt_id, originated_from_transaction_hash

receipt_actions

Action Receipt represents a request to apply actions on the receiver_id side.It could be derived as a result of a Transaction execution or another ACTION Receipt processing.Action kinds: ADD_KEY, CREATE_ACCOUNT, DELEGATE_ACTION, DELETE_ACCOUNT, DELETE_KEY, DEPLOY_CONTRACT, FUNCTION_CALL, STAKE, TRANSFERKey columns: receipt_id, action_kind, method_name, args_json, deposit, gas

receipts (view)

A convenient view that joins receipt details, the transaction that originated the receipt, and the receipt execution outcome.Recommended for most queries as it provides a complete picture of receipt execution.

Select only the columns and partitions (block_date) needed to avoid unnecessary query costs.

account_changes

Each account has an associated state where it stores its metadata and all the contract-related data (contract’s code + storage).Tracks changes to account state over time.Key columns: account_id, block_timestamp, caused_by_receipt_id, update_reason

Additional Information

Skipped Blocks: NEAR Blockchain can contain skipped blocks, e.g. block 57730443. For these cases we can find the block for the chunk data using the prev_block_hash column:

SELECT * 
FROM chunks c 
JOIN blocks b ON c.chunk.header.prev_block_hash = b.header.prev_hash

References

Protocol Basics

Learn about NEAR Protocol fundamentals

Data Flow

Understand how data flows in NEAR

Protocol Spec

Read the complete protocol specification

Use Cases

Analytics
Research
Business Intelligence

Track transaction volume over time
Analyze user behavior patterns
Monitor smart contract usage
Identify trending dApps

Get Started with BigQuery

Open the NEAR BigQuery Public Dataset and start querying

Getting Started

Protocol

Smart Contracts

Web3 Applications

AI and Agents

Chain Abstraction

Primitives

Data Infrastructure

NEAR Public Lakehouse

Users

Researchers

Startups

Key Benefits

Instant Insights

Cost-Effective

Easy to Use

Getting Started

Example Queries

Daily Unique Signers and Accounts

Top Token Holders

Transaction Volume by Date

How Much It Costs?

Cost Optimization Tips

Architecture

Available Data

Tables Overview

Additional Information

References

Protocol Basics

Data Flow

Protocol Spec

Use Cases

Get Started with BigQuery

Build docs developers (and LLMs) love

Getting Started

Protocol

Smart Contracts

Web3 Applications

AI and Agents

Chain Abstraction

Primitives

Data Infrastructure

​NEAR Public Lakehouse

Users

Researchers

Startups

​Key Benefits

Instant Insights

Cost-Effective

Easy to Use

​Getting Started

​Example Queries

​Daily Unique Signers and Accounts

​Top Token Holders

​Transaction Volume by Date

​How Much It Costs?

​Cost Optimization Tips

​Architecture

​Available Data

​Tables Overview

​Additional Information

​References

Protocol Basics

Data Flow

Protocol Spec

​Use Cases

Get Started with BigQuery

Build docs developers (and LLMs) love

NEAR Public Lakehouse

Key Benefits

Getting Started

Example Queries

Daily Unique Signers and Accounts

Top Token Holders

Transaction Volume by Date

How Much It Costs?

Cost Optimization Tips

Architecture

Available Data

Tables Overview

Additional Information

References

Use Cases