Skip to main content
Every vector collection in VecLabs has a Merkle root posted to Solana. This 32-byte hash is a cryptographic fingerprint of your entire collection — immutable, timestamped, and publicly verifiable by anyone.

The Problem VecLabs Solves

Traditional vector databases (Pinecone, Weaviate, Qdrant) have zero verifiable audit trail:
  • No proof of what was stored — You trust the vendor’s API responses
  • No proof of when it was stored — Timestamps are self-reported by the database
  • No proof it hasn’t been tampered with — If someone modifies your collection, you have no way to detect it
For AI agents making consequential decisions — handling money, processing medical data, executing legal workflows — this is a compliance crisis. VecLabs fixes this with Merkle trees.

What is a Merkle Tree?

A Merkle tree is a binary tree where:
  • Leaves are hashes of individual data items (vector IDs in our case)
  • Nodes are hashes of their two children
  • Root is a single 32-byte hash representing the entire dataset
Any change to any leaf produces a completely different root. This makes the root a cryptographic fingerprint of your collection.
                    Root (32 bytes)
                   /              \
            Hash(A,B)           Hash(C,D)
             /    \              /    \
        Hash(A) Hash(B)     Hash(C) Hash(D)
          |       |           |       |
       vec_001 vec_002    vec_003 vec_004
Merkle trees were invented by Ralph Merkle in 1979. They’re used in Git commits, Bitcoin transactions, IPFS content addressing, and now VecLabs vector collections.

How VecLabs Uses Merkle Trees

After Every Write

When you call collection.upsert() or collection.delete(), VecLabs:
  1. Updates the in-memory HNSW index
  2. Builds a Merkle tree from all vector IDs in the collection
  3. Computes the 32-byte root
  4. Posts the root to Solana via the VecLabs Anchor program
  5. Returns the transaction signature
From merkle.rs:13-24:
pub fn new(vector_ids: &[String]) -> Self {
    // Hash each vector ID to create leaves
    let leaves: Vec<[u8; 32]> = vector_ids
        .iter()
        .map(|id| hash_leaf(id.as_bytes()))
        .collect();
    
    // Build the tree layer by layer
    let tree = build_tree(&leaves);
    
    Self { leaves, tree, original_ids: vector_ids.to_vec() }
}

Leaf Hashing

Each vector ID is hashed with a "leaf:" prefix to prevent second-preimage attacks (merkle.rs:126-131):
fn hash_leaf(data: &[u8]) -> [u8; 32] {
    let mut hasher = Sha256::new();
    hasher.update(b"leaf:");
    hasher.update(data);
    hasher.finalize().into()
}

Node Hashing

Internal nodes hash their children with a "node:" prefix (merkle.rs:133-139):
fn hash_pair(left: &[u8; 32], right: &[u8; 32]) -> [u8; 32] {
    let mut hasher = Sha256::new();
    hasher.update(b"node:");
    hasher.update(left);
    hasher.update(right);
    hasher.finalize().into()
}

Tree Building

The tree is built bottom-up until only one node remains (merkle.rs:141-167):
fn build_tree(leaves: &[[u8; 32]]) -> Vec<Vec<[u8; 32]>> {
    if leaves.is_empty() {
        return vec![vec![[0u8; 32]]];
    }
    
    let mut tree: Vec<Vec<[u8; 32]>> = vec![leaves.to_vec()];
    let mut current_layer = leaves.to_vec();
    
    while current_layer.len() > 1 {
        let mut next_layer = Vec::new();
        let mut i = 0;
        while i < current_layer.len() {
            let left = current_layer[i];
            let right = if i + 1 < current_layer.len() {
                current_layer[i + 1]
            } else {
                current_layer[i]  // Duplicate if odd number of nodes
            };
            next_layer.push(hash_pair(&left, &right));
            i += 2;
        }
        tree.push(next_layer.clone());
        current_layer = next_layer;
    }
    
    tree
}

Getting the Root

The Merkle root is the single hash at the top of the tree (merkle.rs:26-32):
pub fn root(&self) -> [u8; 32] {
    match self.tree.last() {
        Some(top) if !top.is_empty() => top[0],
        _ => [0u8; 32],
    }
}

pub fn root_hex(&self) -> String {
    hex::encode(self.root())
}
This 32-byte value is what gets posted to Solana.

Generating Proofs

A Merkle proof lets you prove that a specific vector ID is in the collection without revealing any other IDs. From merkle.rs:39-74:
pub fn generate_proof(&self, vector_id: &str) -> Option<MerkleProof> {
    let leaf = hash_leaf(vector_id.as_bytes());
    let leaf_pos = self.leaves.iter().position(|l| l == &leaf)?;
    
    let mut proof_nodes: Vec<ProofNode> = Vec::new();
    let mut current_pos = leaf_pos;
    
    // Collect sibling hashes at each level
    for layer in &self.tree[..self.tree.len().saturating_sub(1)] {
        let is_right = current_pos % 2 == 0;
        let sibling_pos = if is_right {
            (current_pos + 1).min(layer.len() - 1)
        } else {
            current_pos - 1
        };
        
        proof_nodes.push(ProofNode {
            hash: layer[sibling_pos],
            position: if is_right { NodePosition::Right } else { NodePosition::Left },
        });
        
        current_pos /= 2;
    }
    
    Some(MerkleProof {
        vector_id: vector_id.to_string(),
        leaf_hash: leaf,
        proof_nodes,
        root: self.root(),
    })
}

Proof Structure

From merkle.rs:95-102:
pub struct MerkleProof {
    pub vector_id: String,
    pub leaf_hash: [u8; 32],
    pub proof_nodes: Vec<ProofNode>,
    pub root: [u8; 32],
}

pub struct ProofNode {
    pub hash: [u8; 32],
    pub position: NodePosition,  // Left or Right
}
The proof contains:
  • The vector ID being proven
  • The leaf hash (SHA-256 of the ID)
  • A list of sibling hashes needed to reconstruct the path to the root
  • The expected root

Verifying Proofs

Anyone can verify a proof with just the root — no access to the full tree required (merkle.rs:104-124):
pub fn verify(&self, expected_root: &[u8; 32]) -> bool {
    let mut current_hash = self.leaf_hash;
    
    // Hash up the tree using the proof nodes
    for node in &self.proof_nodes {
        current_hash = match node.position {
            NodePosition::Right => hash_pair(&current_hash, &node.hash),
            NodePosition::Left => hash_pair(&node.hash, &current_hash),
        };
    }
    
    // Check if we arrived at the expected root
    &current_hash == expected_root
}

Example Flow

Proving vec_002 is in the collection:
                    Root
                   /     \
            Hash(A,B)    Hash(C,D)
             /    \         ^
        Hash(A) Hash(B)     |
          |       |         |
       vec_001 vec_002      |
                            |
Proof: [Hash(A), Hash(C,D)] |
                            |
1. Start with Hash(vec_002) |
2. Combine with Hash(A) -> Hash(A,B)
3. Combine with Hash(C,D) -> Root ✓
If the recomputed root matches the on-chain root, the proof is valid.

Proof Size

A Merkle proof for a collection of N vectors contains:log₂(N) * 32 bytes
  • 1M vectors: ~20 hashes = 640 bytes
  • 10M vectors: ~24 hashes = 768 bytes
Extremely compact regardless of collection size.

Verification Time

Verifying a proof requires:log₂(N) hash operations
  • 1M vectors: ~20 SHA-256 hashes
  • 10M vectors: ~24 SHA-256 hashes
Completes in under 1ms on any modern CPU.

On-Chain Storage

The Merkle root is posted to Solana via the VecLabs Anchor program:
// Pseudocode - actual implementation in programs/solvec/
pub fn update_merkle_root(
    ctx: Context<UpdateRoot>,
    root: [u8; 32],
) -> Result<()> {
    let collection = &mut ctx.accounts.collection;
    collection.merkle_root = root;
    collection.last_updated = Clock::get()?.unix_timestamp;
    Ok(())
}
The transaction:
  • Costs ~$0.00025 (5,000 lamports)
  • Finalizes in ~400ms (Solana block time)
  • Is immutable and timestamped forever
The VecLabs Anchor program is live on Solana devnet at 8xjQ2XrdhR4JkGAdTEB7i34DBkbrLRkcgchKjN1Vn5nP. Mainnet deployment is planned.

Use Cases

1. Audit Trail for AI Agents

An AI agent handling financial transactions needs to prove what data it had access to at decision time:
const agent = new AIAgent();

// Agent retrieves context from vector memory
const context = await agent.memory.query(userQuery, 5);

// Agent makes a financial decision
const decision = await agent.decide(context);

// Generate proof of what the agent knew
const proof = await agent.memory.verify();

console.log(`Decision backed by Merkle root: ${proof.rootHex}`);
console.log(`On-chain proof: ${proof.solanaExplorerUrl}`);
If the decision is later audited, the proof is cryptographic and immutable.

2. Compliance for Healthcare AI

A healthcare AI assistant needs to maintain a verifiable record of patient data access:
from solvec import SolVec

sv = SolVec(wallet="~/.config/solana/id.json")
memory = sv.collection("patient-interactions", dimensions=1536)

# Store interaction
memory.upsert([{
    "id": "interaction_001",
    "values": embedding,
    "metadata": {"patient_id": "P12345", "timestamp": "2026-03-07T10:30:00Z"}
}])

# Generate compliance proof
proof = memory.verify()
print(f"HIPAA audit trail: {proof.solana_explorer_url}")
The on-chain root proves the interaction was recorded at that specific time.

3. Decentralized Agent Memory

Multiple agents can share a vector collection and independently verify its state:
// Agent A adds memory
const agentA = new Agent("A");
await agentA.memory.upsert(vectors);
const rootA = await agentA.memory.getMerkleRoot();

// Agent B verifies Agent A's update
const agentB = new Agent("B");
const rootB = await agentB.memory.getMerkleRoot();

if (rootA === rootB) {
  console.log("✓ Agents agree on memory state");
} else {
  console.log("✗ Memory divergence detected");
}
No centralized authority needed — the Solana blockchain is the source of truth.

Testing

From the test suite (merkle.rs:177-235), VecLabs validates:
#[test]
fn test_proof_verifies_correctly() {
    let id_list = ids(10);
    let tree = MerkleTree::new(&id_list);
    let root = tree.root();
    
    let proof = tree.generate_proof("vec_5").unwrap();
    assert!(proof.verify(&root), "Proof should verify against root");
}

#[test]
fn test_proof_fails_with_wrong_root() {
    let tree = MerkleTree::new(&ids(10));
    let wrong_root = [1u8; 32];
    let proof = tree.generate_proof("vec_3").unwrap();
    assert!(!proof.verify(&wrong_root), "Proof should fail with wrong root");
}

#[test]
fn test_all_proofs_verify() {
    let id_list = ids(20);
    let tree = MerkleTree::new(&id_list);
    let root = tree.root();
    
    for id in &id_list {
        let proof = tree.generate_proof(id).unwrap();
        assert!(proof.verify(&root), "Proof failed for id: {}", id);
    }
}

Why This Matters

VecLabs is the only vector database with cryptographic proof of collection state.

Pinecone, Weaviate, Qdrant

  • No audit trail
  • No proof of data integrity
  • Trust the vendor’s API
  • Can’t verify historical state

VecLabs SolVec

  • Every write generates Merkle root
  • Root posted to Solana (immutable)
  • Anyone can verify with just the root
  • Full historical audit trail on-chain
For enterprise AI, this is the difference between “we think the agent said this” and “we can cryptographically prove the agent said this.”

Code Reference

Full implementation: crates/solvec-core/src/merkle.rs Key functions:
  • MerkleTree::new() — merkle.rs:13
  • root() — merkle.rs:26
  • generate_proof() — merkle.rs:39
  • verify() — merkle.rs:107

Next: Encryption

Learn how VecLabs encrypts vectors client-side with wallet-derived keys.

Build docs developers (and LLMs) love