NEAR Public Lakehouse
Blockchain data indexing in NEAR Public Lakehouse is for anyone wanting to understand blockchain data. This includes:Users
Create queries to track NEAR assets, monitor transactions, or analyze on-chain events at a massive scale.
Researchers
Use indexed data for data science tasks, including on-chain activities, identifying trends, or feeding AI/ML pipelines for predictive analysis.
Startups
Use NEAR’s indexed data for deep insights on user engagement, smart contract utilization, or insights across tokens and NFT adoption.
Key Benefits
Instant Insights
Historical on-chain data queried at scale. No infrastructure needed.
Cost-Effective
Eliminate the need to store and process bulk NEAR protocol data. Query as little or as much as you need.
Easy to Use
No prior blockchain experience required. Just bring SQL knowledge to unlock insights.
Getting Started
Login to Google Cloud
Login into your Google Cloud Account.
Open NEAR Dataset
Open the NEAR Protocol BigQuery Public Dataset.
The NEAR Public Lakehouse repository contains the source code for ingesting NEAR Protocol data stored as JSON files in AWS S3 by NEAR Lake Indexer.
Example Queries
Daily Unique Signers and Accounts
How many unique signers and accounts have interacted with my smart contract per day?Top Token Holders
Transaction Volume by Date
How Much It Costs?
- Storage Costs
- Query Costs
NEAR pays for the storage and doesn’t charge you to use the public dataset.To learn more about BigQuery public datasets check this page.
Cost Optimization Tips
Use partition filters
Use partition filters
Always filter by
block_date to reduce the amount of data scanned:Select specific columns
Select specific columns
Only select the columns you need instead of using
SELECT *:Use LIMIT for exploration
Use LIMIT for exploration
When exploring data, use LIMIT to reduce scanned data:
Preview before running
Preview before running
Use the query validator to see estimated costs before running expensive queries.
Architecture
The data is loaded in a streaming fashion using Databricks Autoloader into raw/bronze tables, and transformed with Databricks Delta Live Tables streaming jobs into cleaned/enriched/silver tables. The silver tables are also copied into the GCP BigQuery Public Dataset.
Learn more about the Databricks Medallion Architecture.
Available Data
The current data that NEAR is providing was inspired by NEAR Indexer for Explorer.NEAR plans to improve the data available in the NEAR Public Lakehouse making it easier to consume by denormalizing some tables.
Tables Overview
blocks
blocks
A structure that represents an entire block in the NEAR blockchain.Block is the main entity in NEAR Protocol blockchain. Blocks are produced in NEAR Protocol every second.Key columns:
block_hash, block_height, block_timestamp, prev_block_hash, author_account_idchunks
chunks
A structure that represents a chunk in the NEAR blockchain.Chunk of a Block is a part of a Block from a Shard. The collection of Chunks of the Block forms the NEAR Protocol Block. Chunk contains all the structures that make the Block: Transactions, Receipts, and Chunk Header.Key columns:
chunk_hash, shard_id, signature, gas_limit, gas_usedtransactions
transactions
Transaction is the main way of interaction between a user and a blockchain.Transaction contains: Signer account ID, Receiver account ID, and Actions.Key columns:
transaction_hash, signer_account_id, receiver_account_id, status, block_timestampexecution_outcomes
execution_outcomes
Execution outcome is the result of execution of Transaction or Receipt.In the result of the Transaction execution will always be a Receipt.Key columns:
receipt_id, executed_in_block_hash, status, gas_burnt, tokens_burntreceipt_details
receipt_details
All cross-contract communication in NEAR happens through Receipts.Receipts are stateful in a sense that they serve not only as messages between accounts but also can be stored in the account storage to await DataReceipts. Each receipt has a
predecessor_id (who sent it) and receiver_id (the current account).Key columns: receipt_id, predecessor_account_id, receiver_account_id, receipt_kindreceipt_origin
receipt_origin
Tracks the transaction that originated the receipt.Links receipts back to their originating transaction for tracing transaction flows.Key columns:
receipt_id, originated_from_transaction_hashreceipt_actions
receipt_actions
Action Receipt represents a request to apply actions on the
receiver_id side.It could be derived as a result of a Transaction execution or another ACTION Receipt processing.Action kinds: ADD_KEY, CREATE_ACCOUNT, DELEGATE_ACTION, DELETE_ACCOUNT, DELETE_KEY, DEPLOY_CONTRACT, FUNCTION_CALL, STAKE, TRANSFERKey columns: receipt_id, action_kind, method_name, args_json, deposit, gasreceipts (view)
receipts (view)
A convenient view that joins receipt details, the transaction that originated the receipt, and the receipt execution outcome.Recommended for most queries as it provides a complete picture of receipt execution.
account_changes
account_changes
Each account has an associated state where it stores its metadata and all the contract-related data (contract’s code + storage).Tracks changes to account state over time.Key columns:
account_id, block_timestamp, caused_by_receipt_id, update_reasonAdditional Information
References
Protocol Basics
Learn about NEAR Protocol fundamentals
Data Flow
Understand how data flows in NEAR
Protocol Spec
Read the complete protocol specification
Use Cases
- Analytics
- Research
- Business Intelligence
- Track transaction volume over time
- Analyze user behavior patterns
- Monitor smart contract usage
- Identify trending dApps
Get Started with BigQuery
Open the NEAR BigQuery Public Dataset and start querying