Test Data

Overview

Chainbench fetches real blockchain data before starting tests to use as parameters for RPC method calls. This ensures that benchmarks use realistic data patterns and valid addresses, transaction hashes, and block numbers from the target blockchain.

How Test Data Works

Before spawning users, Chainbench:

Connects to the blockchain node
Fetches the chain ID to identify the network
Determines the block range to fetch based on configuration
Retrieves blocks with transaction data
Extracts useful information (addresses, tx hashes, block hashes, etc.)
Stores this data in memory for use during the test

Test data is fetched once before the test starts and shared across all workers. This improves consistency and makes benchmarks more comparable across runs.

Test Data Sizes

You can control how much test data is generated using the --size flag:

Size	Blocks	Use Case
XS	10	Quick validation tests
S	100	Default - suitable for most tests
M	1,000	More data variety
L	10,000	Extensive testing
XL	100,000	Maximum variety (longer generation time)

Usage Example

chainbench start --profile evm.light \
  --users 50 \
  --size M \
  --target https://node-url \
  --headless \
  --autoquit

Larger data sizes result in longer test data generation times but provide more variety in test parameters. For most benchmarks, size S (default) is sufficient.

Block Range Configuration

Chainbench determines which blocks to fetch based on several factors:

Default Behavior

For each supported network, Chainbench has a configured starting block to avoid fetching from genesis:

# Example from chainbench/test_data/evm.py
DATA = {
    1: {  # Ethereum Mainnet
        "name": "ethereum-mainnet",
        "start_block": 10000000,  # Start from block 10M
        "contract_addresses": [...]
    },
    56: {  # BSC Mainnet
        "name": "bsc-mainnet",
        "start_block": 20000000,  # Start from block 20M
        "contract_addresses": [...]
    },
}

The end block defaults to the latest block on the chain.

Custom Block Range

You can specify a custom block range:

chainbench start --profile evm.light \
  --start-block 15000000 \
  --end-block 15001000 \
  --size S \
  --target https://node-url

Block Range Selection Logic

The test data system selects blocks using this logic:

If --start-block and --end-block are provided, use those values
If --use-latest-blocks is set, ignore custom range and use latest blocks
Otherwise, use the network’s default start block and latest block as end
Validate that the range is valid and within the node’s available history

Using Latest Blocks

For nodes with limited history (e.g., running in fast sync mode), use the --use-latest-blocks flag:

chainbench start --profile evm.light \
  --users 50 \
  --size S \
  --use-latest-blocks \
  --target https://node-url \
  --headless \
  --autoquit

With --use-latest-blocks:

Chainbench fetches the latest N blocks (N = size)
A background process continuously updates test data with new blocks
Ensures all test data references are within the node’s available history

If your node only keeps the last 128 blocks, use --use-latest-blocks --size XS to ensure all data references are valid.

Dynamic Parameters

Chainbench uses parameter factories to generate realistic RPC call parameters from the test data.

Available Data

For EVM chains, test data includes:

Block numbers: Random block numbers from the fetched range
Block hashes: Hashes of fetched blocks
Transaction hashes: Hashes from transactions in fetched blocks
Addresses: Both from and to addresses from transactions (up to 100 per block)
Contract addresses: Known contract addresses for the network

Parameter Factories

Profiles use parameter factories to generate realistic call parameters:

from chainbench.util.rng import get_rng

# Random block number with full transaction details
params = self._block_params_factory()
# Returns: ["0x1a2b3c", True]

# Random transaction hash
params = self._transaction_by_hash_params_factory(get_rng())
# Returns: ["0xabc123..."]

# Random address with latest block tag
params = self._get_balance_params_factory(get_rng())
# Returns: ["0x742d35Cc...", "latest"]

Parameter factories use a seeded random number generator (get_rng()) to ensure consistency across runs and workers.

Reference URL

You can fetch test data from a different node than the one being tested:

chainbench start --profile evm.light \
  --target https://node-to-test \
  --ref-url https://reference-node \
  --users 50

This is useful when:

Testing a new node that doesn’t have historical data yet
The node under test has limited archive access
You want consistent test data across multiple test runs

Network-Specific Data

Chainbench includes predefined contract addresses for popular tokens on each network:

# Example: Ethereum Mainnet contracts
contract_addresses = [
    "0xdAC17F958D2ee523a2206206994597C13D831ec7",  # USDT
    "0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48",  # USDC
    "0x2260FAC5E5542a773Aa44fBCfeDf7C193bc2C599",  # WBTC
    # ...
]

These are used for eth_call and other contract-related operations.

Data Consistency

How Chainbench Ensures Consistency

Chainbench uses several techniques to ensure consistent benchmarks:

Seeded RNG: Each parameter factory uses a function-specific random seed
Shared data: All workers use the same test data
Deterministic selection: Same random seed produces same data selection
Fixed block range: Unless using --use-latest-blocks, the block range is fixed

This makes benchmark results more comparable across runs and helps identify performance changes.

Example: Complete Test Data Workflow

# Step 1: Chainbench connects and identifies the network
# Fetches chain ID: 1 (Ethereum Mainnet)

# Step 2: Determines block range
# Start: 10,000,000 (network default)
# End: 18,500,000 (latest block)
# With --size S, will fetch 100 random blocks from this range

# Step 3: Fetches blocks with transactions
# Extracts:
# - 100 block numbers and hashes
# - Transaction hashes from those blocks
# - Addresses (from/to) from transactions

# Step 4: Test runs
# Each RPC call uses random data from the fetched set
# eth_getBlockByNumber uses random block number
# eth_getTransactionByHash uses random tx hash
# eth_getBalance uses random address

Monitor test data generation progress in the console output. If generation takes too long, consider using a smaller size or a more specific block range.

Get Started

Core Concepts

Guides

Supported Blockchains

Overview

How Test Data Works

Test Data Sizes

Usage Example

Block Range Configuration

Default Behavior

Custom Block Range

Using Latest Blocks

Dynamic Parameters

Available Data

Parameter Factories

Reference URL

Network-Specific Data

Data Consistency

Example: Complete Test Data Workflow

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Supported Blockchains

​Overview

​How Test Data Works

​Test Data Sizes

​Usage Example

​Block Range Configuration

​Default Behavior

​Custom Block Range

​Using Latest Blocks

​Dynamic Parameters

​Available Data

​Parameter Factories

​Reference URL

​Network-Specific Data

​Data Consistency

​Example: Complete Test Data Workflow

Build docs developers (and LLMs) love

Overview

How Test Data Works

Test Data Sizes

Usage Example

Block Range Configuration

Default Behavior

Custom Block Range

Using Latest Blocks

Dynamic Parameters

Available Data

Parameter Factories

Reference URL

Network-Specific Data

Data Consistency

Example: Complete Test Data Workflow