Merkle Proofs

Every vector collection in VecLabs has a Merkle root posted to Solana. This 32-byte hash is a cryptographic fingerprint of your entire collection — immutable, timestamped, and publicly verifiable by anyone.

The Problem VecLabs Solves

Traditional vector databases (Pinecone, Weaviate, Qdrant) have zero verifiable audit trail:

No proof of what was stored — You trust the vendor’s API responses
No proof of when it was stored — Timestamps are self-reported by the database
No proof it hasn’t been tampered with — If someone modifies your collection, you have no way to detect it

For AI agents making consequential decisions — handling money, processing medical data, executing legal workflows — this is a compliance crisis. VecLabs fixes this with Merkle trees.

What is a Merkle Tree?

A Merkle tree is a binary tree where:

Leaves are hashes of individual data items (vector IDs in our case)
Nodes are hashes of their two children
Root is a single 32-byte hash representing the entire dataset

Any change to any leaf produces a completely different root. This makes the root a cryptographic fingerprint of your collection.

                    Root (32 bytes)
                   /              \
            Hash(A,B)           Hash(C,D)
             /    \              /    \
        Hash(A) Hash(B)     Hash(C) Hash(D)
          |       |           |       |
       vec_001 vec_002    vec_003 vec_004

Merkle trees were invented by Ralph Merkle in 1979. They’re used in Git commits, Bitcoin transactions, IPFS content addressing, and now VecLabs vector collections.

How VecLabs Uses Merkle Trees

After Every Write

When you call collection.upsert() or collection.delete(), VecLabs:

Updates the in-memory HNSW index
Builds a Merkle tree from all vector IDs in the collection
Computes the 32-byte root
Posts the root to Solana via the VecLabs Anchor program
Returns the transaction signature

From merkle.rs:13-24:

pub fn new(vector_ids: &[String]) -> Self {
    // Hash each vector ID to create leaves
    let leaves: Vec<[u8; 32]> = vector_ids
        .iter()
        .map(|id| hash_leaf(id.as_bytes()))
        .collect();
    
    // Build the tree layer by layer
    let tree = build_tree(&leaves);
    
    Self { leaves, tree, original_ids: vector_ids.to_vec() }
}

Leaf Hashing

Each vector ID is hashed with a "leaf:" prefix to prevent second-preimage attacks (merkle.rs:126-131):

fn hash_leaf(data: &[u8]) -> [u8; 32] {
    let mut hasher = Sha256::new();
    hasher.update(b"leaf:");
    hasher.update(data);
    hasher.finalize().into()
}

Node Hashing

Internal nodes hash their children with a "node:" prefix (merkle.rs:133-139):

fn hash_pair(left: &[u8; 32], right: &[u8; 32]) -> [u8; 32] {
    let mut hasher = Sha256::new();
    hasher.update(b"node:");
    hasher.update(left);
    hasher.update(right);
    hasher.finalize().into()
}

Tree Building

The tree is built bottom-up until only one node remains (merkle.rs:141-167):

fn build_tree(leaves: &[[u8; 32]]) -> Vec<Vec<[u8; 32]>> {
    if leaves.is_empty() {
        return vec![vec![[0u8; 32]]];
    }
    
    let mut tree: Vec<Vec<[u8; 32]>> = vec![leaves.to_vec()];
    let mut current_layer = leaves.to_vec();
    
    while current_layer.len() > 1 {
        let mut next_layer = Vec::new();
        let mut i = 0;
        while i < current_layer.len() {
            let left = current_layer[i];
            let right = if i + 1 < current_layer.len() {
                current_layer[i + 1]
            } else {
                current_layer[i]  // Duplicate if odd number of nodes
            };
            next_layer.push(hash_pair(&left, &right));
            i += 2;
        }
        tree.push(next_layer.clone());
        current_layer = next_layer;
    }
    
    tree
}

Getting the Root

The Merkle root is the single hash at the top of the tree (merkle.rs:26-32):

pub fn root(&self) -> [u8; 32] {
    match self.tree.last() {
        Some(top) if !top.is_empty() => top[0],
        _ => [0u8; 32],
    }
}

pub fn root_hex(&self) -> String {
    hex::encode(self.root())
}

This 32-byte value is what gets posted to Solana.

Generating Proofs

A Merkle proof lets you prove that a specific vector ID is in the collection without revealing any other IDs. From merkle.rs:39-74:

pub fn generate_proof(&self, vector_id: &str) -> Option<MerkleProof> {
    let leaf = hash_leaf(vector_id.as_bytes());
    let leaf_pos = self.leaves.iter().position(|l| l == &leaf)?;
    
    let mut proof_nodes: Vec<ProofNode> = Vec::new();
    let mut current_pos = leaf_pos;
    
    // Collect sibling hashes at each level
    for layer in &self.tree[..self.tree.len().saturating_sub(1)] {
        let is_right = current_pos % 2 == 0;
        let sibling_pos = if is_right {
            (current_pos + 1).min(layer.len() - 1)
        } else {
            current_pos - 1
        };
        
        proof_nodes.push(ProofNode {
            hash: layer[sibling_pos],
            position: if is_right { NodePosition::Right } else { NodePosition::Left },
        });
        
        current_pos /= 2;
    }
    
    Some(MerkleProof {
        vector_id: vector_id.to_string(),
        leaf_hash: leaf,
        proof_nodes,
        root: self.root(),
    })
}

Proof Structure

From merkle.rs:95-102:

pub struct MerkleProof {
    pub vector_id: String,
    pub leaf_hash: [u8; 32],
    pub proof_nodes: Vec<ProofNode>,
    pub root: [u8; 32],
}

pub struct ProofNode {
    pub hash: [u8; 32],
    pub position: NodePosition,  // Left or Right
}

The proof contains:

The vector ID being proven
The leaf hash (SHA-256 of the ID)
A list of sibling hashes needed to reconstruct the path to the root
The expected root

Verifying Proofs

Anyone can verify a proof with just the root — no access to the full tree required (merkle.rs:104-124):

pub fn verify(&self, expected_root: &[u8; 32]) -> bool {
    let mut current_hash = self.leaf_hash;
    
    // Hash up the tree using the proof nodes
    for node in &self.proof_nodes {
        current_hash = match node.position {
            NodePosition::Right => hash_pair(&current_hash, &node.hash),
            NodePosition::Left => hash_pair(&node.hash, &current_hash),
        };
    }
    
    // Check if we arrived at the expected root
    &current_hash == expected_root
}

Example Flow

Proving vec_002 is in the collection:

                    Root
                   /     \
            Hash(A,B)    Hash(C,D)
             /    \         ^
        Hash(A) Hash(B)     |
          |       |         |
       vec_001 vec_002      |
                            |
Proof: [Hash(A), Hash(C,D)] |
                            |
1. Start with Hash(vec_002) |
2. Combine with Hash(A) -> Hash(A,B)
3. Combine with Hash(C,D) -> Root ✓

If the recomputed root matches the on-chain root, the proof is valid.

Proof Size

A Merkle proof for a collection of N vectors contains:log₂(N) * 32 bytes

1M vectors: ~20 hashes = 640 bytes
10M vectors: ~24 hashes = 768 bytes

Extremely compact regardless of collection size.

Verification Time

Verifying a proof requires:log₂(N) hash operations

1M vectors: ~20 SHA-256 hashes
10M vectors: ~24 SHA-256 hashes

Completes in under 1ms on any modern CPU.

On-Chain Storage

The Merkle root is posted to Solana via the VecLabs Anchor program:

// Pseudocode - actual implementation in programs/solvec/
pub fn update_merkle_root(
    ctx: Context<UpdateRoot>,
    root: [u8; 32],
) -> Result<()> {
    let collection = &mut ctx.accounts.collection;
    collection.merkle_root = root;
    collection.last_updated = Clock::get()?.unix_timestamp;
    Ok(())
}

The transaction:

Costs ~$0.00025 (5,000 lamports)
Finalizes in ~400ms (Solana block time)
Is immutable and timestamped forever

The VecLabs Anchor program is live on Solana devnet at 8xjQ2XrdhR4JkGAdTEB7i34DBkbrLRkcgchKjN1Vn5nP. Mainnet deployment is planned.

Use Cases

1. Audit Trail for AI Agents

An AI agent handling financial transactions needs to prove what data it had access to at decision time:

const agent = new AIAgent();

// Agent retrieves context from vector memory
const context = await agent.memory.query(userQuery, 5);

// Agent makes a financial decision
const decision = await agent.decide(context);

// Generate proof of what the agent knew
const proof = await agent.memory.verify();

console.log(`Decision backed by Merkle root: ${proof.rootHex}`);
console.log(`On-chain proof: ${proof.solanaExplorerUrl}`);

If the decision is later audited, the proof is cryptographic and immutable.

2. Compliance for Healthcare AI

A healthcare AI assistant needs to maintain a verifiable record of patient data access:

from solvec import SolVec

sv = SolVec(wallet="~/.config/solana/id.json")
memory = sv.collection("patient-interactions", dimensions=1536)

# Store interaction
memory.upsert([{
    "id": "interaction_001",
    "values": embedding,
    "metadata": {"patient_id": "P12345", "timestamp": "2026-03-07T10:30:00Z"}
}])

# Generate compliance proof
proof = memory.verify()
print(f"HIPAA audit trail: {proof.solana_explorer_url}")

The on-chain root proves the interaction was recorded at that specific time.

3. Decentralized Agent Memory

Multiple agents can share a vector collection and independently verify its state:

// Agent A adds memory
const agentA = new Agent("A");
await agentA.memory.upsert(vectors);
const rootA = await agentA.memory.getMerkleRoot();

// Agent B verifies Agent A's update
const agentB = new Agent("B");
const rootB = await agentB.memory.getMerkleRoot();

if (rootA === rootB) {
  console.log("✓ Agents agree on memory state");
} else {
  console.log("✗ Memory divergence detected");
}

No centralized authority needed — the Solana blockchain is the source of truth.

Testing

From the test suite (merkle.rs:177-235), VecLabs validates:

#[test]
fn test_proof_verifies_correctly() {
    let id_list = ids(10);
    let tree = MerkleTree::new(&id_list);
    let root = tree.root();
    
    let proof = tree.generate_proof("vec_5").unwrap();
    assert!(proof.verify(&root), "Proof should verify against root");
}

#[test]
fn test_proof_fails_with_wrong_root() {
    let tree = MerkleTree::new(&ids(10));
    let wrong_root = [1u8; 32];
    let proof = tree.generate_proof("vec_3").unwrap();
    assert!(!proof.verify(&wrong_root), "Proof should fail with wrong root");
}

#[test]
fn test_all_proofs_verify() {
    let id_list = ids(20);
    let tree = MerkleTree::new(&id_list);
    let root = tree.root();
    
    for id in &id_list {
        let proof = tree.generate_proof(id).unwrap();
        assert!(proof.verify(&root), "Proof failed for id: {}", id);
    }
}

Why This Matters

VecLabs is the only vector database with cryptographic proof of collection state.

Pinecone, Weaviate, Qdrant

No audit trail
No proof of data integrity
Trust the vendor’s API
Can’t verify historical state

VecLabs SolVec

Every write generates Merkle root
Root posted to Solana (immutable)
Anyone can verify with just the root
Full historical audit trail on-chain

For enterprise AI, this is the difference between “we think the agent said this” and “we can cryptographically prove the agent said this.”

Code Reference

Full implementation: crates/solvec-core/src/merkle.rs Key functions:

MerkleTree::new() — merkle.rs:13
root() — merkle.rs:26
generate_proof() — merkle.rs:39
verify() — merkle.rs:107

Next: Encryption

Learn how VecLabs encrypts vectors client-side with wallet-derived keys.

Get Started

Core Concepts

Guides

Migration

Examples

The Problem VecLabs Solves

What is a Merkle Tree?

How VecLabs Uses Merkle Trees

After Every Write

Leaf Hashing

Node Hashing

Tree Building

Getting the Root

Generating Proofs

Proof Structure

Verifying Proofs

Example Flow

Proof Size

Verification Time

On-Chain Storage

Use Cases

1. Audit Trail for AI Agents

2. Compliance for Healthcare AI

3. Decentralized Agent Memory

Testing

Why This Matters

Pinecone, Weaviate, Qdrant

VecLabs SolVec

Code Reference

Next: Encryption

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Migration

Examples

​The Problem VecLabs Solves

​What is a Merkle Tree?

​How VecLabs Uses Merkle Trees

​After Every Write

​Leaf Hashing

​Node Hashing

​Tree Building

​Getting the Root

​Generating Proofs

​Proof Structure

​Verifying Proofs

​Example Flow

Proof Size

Verification Time

​On-Chain Storage

​Use Cases

​1. Audit Trail for AI Agents

​2. Compliance for Healthcare AI

​3. Decentralized Agent Memory

​Testing

​Why This Matters

Pinecone, Weaviate, Qdrant

VecLabs SolVec

​Code Reference

Next: Encryption

Build docs developers (and LLMs) love

The Problem VecLabs Solves

What is a Merkle Tree?

How VecLabs Uses Merkle Trees

After Every Write

Leaf Hashing

Node Hashing

Tree Building

Getting the Root

Generating Proofs

Proof Structure

Verifying Proofs

Example Flow

On-Chain Storage

Use Cases

1. Audit Trail for AI Agents

2. Compliance for Healthcare AI

3. Decentralized Agent Memory

Testing

Why This Matters

Code Reference