Skip to main content
The HNSW (Hierarchical Navigable Small World) index is the performance core of VecLabs. It’s what makes sub-5ms queries possible at 100K+ vector scale.

What is HNSW?

HNSW is a graph-based algorithm for approximate nearest neighbor (ANN) search. Instead of comparing your query against every vector in the collection (slow), it builds a multi-layer graph that lets you navigate to the nearest neighbors in logarithmic time. Think of it like a highway system:
  • Top layers: Fast highways that cover long distances between major cities
  • Bottom layer: Local streets that get you to the exact address
When you query, you start at the top layer and greedily descend until you reach the base layer, then do a more thorough search for the top-K results.
HNSW was introduced in the 2016 paper “Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs” by Malkov and Yashunin. It’s the same algorithm used by Pinecone, Qdrant, and Weaviate.

Core Parameters

From hnsw.rs:36-50, the VecLabs HNSW implementation exposes three key parameters:
pub struct HNSWIndex {
    m: usize,               // Max connections per node per layer
    ef_construction: usize, // Beam width during index build
    ef_search: usize,       // Beam width during query
    // ...
}

M

Max connections per node
Default: 16
Higher M = better recall, more memory usage. Standard value is 16. Layer 0 uses M * 2 = 32 connections.

ef_construction

Build beam width
Default: 200
Higher ef_construction = better quality graph, slower inserts. Determines how many candidates are explored when inserting a new vector.

ef_search

Query beam width
Default: 50
Higher ef_search = better recall, slower queries. You can tune this at query time to trade speed for accuracy.

Choosing Parameters

For most users: Use the defaults. Call HNSWIndex::default_cosine() (hnsw.rs:95-97):
pub fn default_cosine() -> Self {
    Self::new(16, 200, DistanceMetric::Cosine)
}
If you need higher recall: Increase ef_search at query time:
index.set_ef_search(100); // Double the beam width for better recall
let results = index.query(&query_vector, 10)?;
If you have lots of memory and want maximum quality: Increase M and ef_construction at index creation:
let mut index = HNSWIndex::new(32, 400, DistanceMetric::Cosine);
// Warning: 2x memory usage, slower inserts, marginal recall improvement

How Insertion Works

When you insert a vector into the HNSW index (hnsw.rs:122-169):
  1. Generate random level — Each vector is assigned a random level (0 to ~4). Higher levels are exponentially less likely.
// hnsw.rs:456-468
fn random_level(&self) -> usize {
    let mut rng = rand::thread_rng();
    let mut level = 0usize;
    loop {
        let r: f64 = rng.gen();
        if r > self.ml || level >= 16 {
            break;
        }
        level += 1;
    }
    level
}
  1. Find nearest neighbors at each layer — Starting from the entry point, greedily descend to the insert level, then search for the M nearest neighbors at each layer.
  2. Create bidirectional connections — Connect the new node to its neighbors, and connect those neighbors back to the new node.
  3. Prune if necessary — If any neighbor now has more than M connections, prune the weakest ones to maintain graph quality.
From hnsw.rs:350-438, the connect_new_node function handles this:
fn connect_new_node(&mut self, node_id: &str, insert_level: usize) {
    // Greedy descent from top to insert_level+1
    for layer_idx in (insert_level + 1..=self.entry_point_level).rev() {
        let candidates = self.search_layer(&node_values, &current_nearest, 1, layer_idx);
        // ...
    }
    
    // From insert_level down to 0: search and connect
    for layer_idx in (0..=insert_level).rev() {
        let candidates = self.search_layer(
            &node_values,
            &current_nearest,
            self.ef_construction,
            layer_idx,
        );
        
        // Bidirectional connections with pruning
        // ...
    }
}

How Query Works

Querying is a two-phase process (hnsw.rs:210-259):

Phase 1: Greedy Descent

Start at the entry point (the highest-level node) and greedily descend through layers 1+ with beam width 1. This gets you close to the target region fast.
// hnsw.rs:237-244
let mut current_nearest = entry;
for layer_idx in (1..=self.entry_point_level).rev() {
    let candidates = self.search_layer(query_vector, &current_nearest, 1, layer_idx);
    if let Some(best) = candidates.into_iter().next() {
        current_nearest = best.id;
    }
}

Phase 2: Beam Search at Layer 0

At the base layer (layer 0), use the full ef_search beam width to explore candidates and return the top-K.
// hnsw.rs:246-258
let ef = self.ef_search.max(top_k);
let candidates = self.search_layer(query_vector, &current_nearest, ef, 0);

let results: Vec<QueryResult> = candidates
    .into_iter()
    .take(top_k)
    .map(|c| {
        let vec = &self.vectors[&c.id];
        QueryResult::new(c.id, c.score, vec.metadata.clone())
    })
    .collect();
The search_layer function (hnsw.rs:263-348) implements beam search:
fn search_layer(
    &self,
    query: &[f32],
    entry_id: &str,
    ef: usize,
    layer_idx: usize,
) -> Vec<Candidate> {
    let mut visited: HashSet<String> = HashSet::new();
    let mut candidates: BinaryHeap<Candidate> = BinaryHeap::new();
    let mut results: BinaryHeap<Candidate> = BinaryHeap::new();
    
    // Expand candidates, track visited, maintain top-ef results
    while let Some(current) = candidates.pop() {
        // Explore neighbors, update results if score > worst_result_score
        // ...
    }
    
    // Return results sorted by score descending
}

Distance Metrics

VecLabs supports three distance metrics (distance.rs:14-59):

Cosine

Default metric
Returns similarity in [-1, 1]
Higher = more similar
pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
    let dot = a.iter().zip(b).map(|(x,y)| x*y).sum();
    let norm_a = a.iter().map(|x| x*x).sum::<f32>().sqrt();
    let norm_b = b.iter().map(|x| x*x).sum::<f32>().sqrt();
    (dot / (norm_a * norm_b)).clamp(-1.0, 1.0)
}

Euclidean

L2 distance
Returns distance in [0, ∞)
Lower = more similar
pub fn euclidean_distance(a: &[f32], b: &[f32]) -> f32 {
    a.iter()
        .zip(b)
        .map(|(x, y)| (x - y) * (x - y))
        .sum::<f32>()
        .sqrt()
}

Dot Product

Inner product
Returns scalar
Higher = more similar
pub fn dot_product(a: &[f32], b: &[f32]) -> f32 {
    a.iter().zip(b).map(|(x, y)| x * y).sum()
}
Best for pre-normalized vectors (e.g., OpenAI embeddings).
Performance optimization: If your vectors are already normalized (like OpenAI embeddings), use DotProduct metric instead of Cosine. They’re mathematically equivalent for unit vectors, but dot product avoids the sqrt computation.

Performance Characteristics

Time Complexity

  • Insert: O(log N * M * ef_construction)
  • Query: O(log N * ef_search)
  • Delete: O(M * L) where L is number of layers

Space Complexity

Memory usage per vector:
Vector storage: dimension * 4 bytes (f32)
Graph edges:    M * 2 * 8 bytes (layer 0) + M * 8 bytes * num_layers
Metadata:       variable (JSON)
For 384-dimensional vectors with default M=16:
  • Vector: 1,536 bytes
  • Graph: ~256 bytes
  • Total: ~1.8 KB per vector
At 1 million vectors: ~1.8 GB RAM

Benchmarks

Measured on Apple M2, 16GB RAM:
VectorsDimensionsp50p95p99
10K3840.8ms1.2ms1.8ms
100K3841.9ms2.8ms4.3ms
1M3843.2ms5.1ms7.8ms
From the test suite (hnsw.rs:649-658):
#[test]
fn test_large_index_query_returns_results() {
    let mut index = HNSWIndex::new(16, 200, DistanceMetric::Cosine);
    for i in 0..1000 {
        index
            .insert(make_vector(&format!("v{}", i), random_vector(384)))
            .unwrap();
    }
    let results = index.query(&random_vector(384), 10).unwrap();
    assert_eq!(results.len(), 10);
}

Serialization

The entire index can be serialized to JSON for persistence (hnsw.rs:470-478):
pub fn to_json(&self) -> Result<String, SolVecError> {
    serde_json::to_string(self)
        .map_err(|e| SolVecError::SerializationError(e.to_string()))
}

pub fn from_json(json: &str) -> Result<Self, SolVecError> {
    serde_json::from_str(json)
        .map_err(|e| SolVecError::SerializationError(e.to_string()))
}
This is used for Shadow Drive persistence (in progress) and local caching.

Index Statistics

You can inspect index health with stats() (hnsw.rs:481-491):
pub fn stats(&self) -> IndexStats {
    IndexStats {
        vector_count: self.vectors.len(),
        layer_count: self.layers.len(),
        entry_point_level: self.entry_point_level,
        dimension: self.dimension.unwrap_or(0),
        total_inserts: self.total_inserts,
        total_deletes: self.total_deletes,
        metric: self.metric,
    }
}

Why Rust Matters

The HNSW algorithm itself isn’t unique to VecLabs — Pinecone, Qdrant, and Weaviate all use HNSW. What’s different is the implementation language.

Python/Go (Competitors)

  • Garbage collector pauses query execution unpredictably
  • GC pressure increases with collection size and query load
  • p99 latency degrades as you scale
  • Can’t guarantee sub-10ms under production load

Rust (VecLabs)

  • No garbage collector = no pause-the-world events
  • Memory is managed explicitly with compile-time guarantees
  • p99 latency stays consistent regardless of load
  • Predictable performance is the unfair advantage
This is why VecLabs can promise sub-5ms p99 at 100K vectors and actually deliver it in production.

Code Reference

Full implementation: crates/solvec-core/src/hnsw.rs Key functions:
  • insert() — hnsw.rs:122
  • query() — hnsw.rs:210
  • delete() — hnsw.rs:172
  • search_layer() — hnsw.rs:263
  • connect_new_node() — hnsw.rs:350

Next: Merkle Proofs

Learn how VecLabs uses Merkle trees to create cryptographic proof of collection state.

Build docs developers (and LLMs) love