Skip to main content
Vespa provides efficient vector search capabilities using Hierarchical Navigable Small World (HNSW) graphs for approximate K-nearest neighbor (ANN) search. This enables semantic search, recommendation systems, and similarity matching at scale.

Overview

Vespa’s vector search implementation (searchlib/src/vespa/searchlib/tensor/hnsw_index.h:29-41):
Implementation of a hierarchical navigable small world graph (HNSW) that is used for approximate K-nearest neighbor search. The implementation supports 1 write thread and multiple search threads without the use of mutexes. This is achieved by using data stores that use generation tracking and associated memory management. The implementation is mainly based on the algorithms described in “Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs” (Yu. A. Malkov, D. A. Yashunin).

Defining Tensor Fields

First, define a tensor field in your schema with HNSW indexing:
schema product {
  document product {
    field embedding type tensor<float>(x[384]) {
      indexing: attribute | index
      attribute {
        distance-metric: angular
      }
      index {
        hnsw {
          max-links-per-node: 16
          neighbors-to-explore-at-insert: 200
        }
      }
    }
  }
}

Distance Metrics

Vespa supports multiple distance metrics:
Computes the angular distance between vectors. Best for normalized embeddings.
attribute {
  distance-metric: angular
}
Optimized for pre-normalized vectors. From the test schema (config-model/src/test/derived/hnsw_index/test.sd:4-16):
field t1 type tensor(x[128]) {
  indexing: attribute | index
  attribute {
    distance-metric: prenormalized-angular
  }
  index {
    hnsw {
      max-links-per-node: 32
      neighbors-to-explore-at-insert: 300
      multi-threaded-indexing: false
    }
  }
}
Computes Euclidean (L2) distance between vectors.
attribute {
  distance-metric: euclidean
}
For geographic coordinates (latitude, longitude). From test schema (config-model/src/test/derived/hnsw_index/test.sd:17-23):
field t2 type tensor(x[2]) {
  indexing: attribute
  attribute {
    distance-metric: geodegrees
  }
}
Computes inner product. Useful for learned embeddings with specific properties.
attribute {
  distance-metric: dotproduct
}

HNSW Configuration

Controls the connectivity of the graph. Higher values improve recall but increase memory usage and indexing time.
index {
  hnsw {
    max-links-per-node: 16  // Default: 16, typical range: 8-32
  }
}

neighbors-to-explore-at-insert

Number of neighbors to consider during document insertion. Higher values improve index quality but slow down indexing.
index {
  hnsw {
    neighbors-to-explore-at-insert: 200  // Default: 200, typical range: 100-500
  }
}

multi-threaded-indexing

Enable parallel indexing for faster bulk loading:
index {
  hnsw {
    multi-threaded-indexing: true  // Default: true
  }
}

Querying with nearestNeighbor

Use the nearestNeighbor query operator in YQL:
select * from sources product 
where {targetHits: 10}nearestNeighbor(embedding, query_embedding)
From the Java DSL implementation (client/src/main/java/ai/vespa/client/dsl/Field.java):
public Query nearestNeighbor(String rankFeature) {
    return common("nearestNeighbor", annotation, (Object) rankFeature);
}

public Query nearestNeighbor(Annotation annotation, String rankFeature) {
    return common("nearestNeighbor", annotation, (Object) rankFeature);
}

Query Parameters

targetHits

Number of approximate nearest neighbors to return:
{targetHits: 100}nearestNeighbor(embedding, query_embedding)
targetHits controls the recall-performance tradeoff. Higher values improve recall but increase query latency.

approximate

Control whether to use approximate or exact search:
// Approximate search (fast, default)
{approximate: true}nearestNeighbor(embedding, query_embedding)

// Exact search (slower but perfect recall)
{approximate: false}nearestNeighbor(embedding, query_embedding)

hnsw.exploreAdditionalHits

Fine-tune the search quality:
{targetHits: 10, hnsw.exploreAdditionalHits: 50}nearestNeighbor(embedding, query_embedding)

Embedding Integration

Vespa can automatically generate embeddings using the Embedder interface:

Schema Configuration

schema product {
  document product {
    field title type string {
      indexing: summary | index
    }
    
    field embedding type tensor<float>(x[384]) {
      indexing: input title | embed | attribute | index
      attribute {
        distance-metric: angular
      }
      index {
        hnsw {
          max-links-per-node: 16
          neighbors-to-explore-at-insert: 200
        }
      }
    }
  }
}

Query-Time Embedding

Embed query text automatically:
select * from sources product 
where {targetHits: 10}nearestNeighbor(embedding, query_embedding)
Pass the query text via input.query(query_embedding) parameter:
POST /search/
{
  "yql": "select * from product where {targetHits: 10}nearestNeighbor(embedding, query_embedding)",
  "input.query(query_embedding)": "semantic search engine"
}
Combine vector search with text search and filters:
select * from sources product 
where 
  (
    {targetHits: 100}nearestNeighbor(embedding, query_embedding)
    or title contains "laptop"
  )
  and price < 2000
  and in_stock = true
order by relevance desc
limit 20

Distance Calculation Features

Access distance scores in ranking expressions:
rank-profile semantic {
  first-phase {
    expression: closeness(field, embedding)
  }
}
The closeness feature (searchlib/src/vespa/searchlib/features/closenessfeature.h) provides normalized distance scores.

Performance Tuning

1

Choose appropriate max-links-per-node

Start with 16, increase to 32 for better recall on large datasets
2

Tune targetHits

Use 2-5x your desired result count for good recall-latency balance
3

Use pre-normalized embeddings

When using angular distance, normalize vectors and use prenormalized-angular for better performance
4

Enable multi-threaded indexing

For bulk loading, enable multi-threaded-indexing: true
5

Apply filters after vector search

Use targetHits to get candidates, then apply filters for efficiency

Memory Usage

HNSW index memory usage depends on:
  • Number of documents
  • Vector dimensions
  • max-links-per-node setting
Approximate formula:
memory ≈ num_docs × dimensions × 4 bytes + num_docs × max-links-per-node × 8 bytes
HNSW indexes require significant memory. For a 1M document collection with 384-dimensional float vectors and 16 links per node:
  • Vector data: ~1.5 GB
  • HNSW graph: ~128 MB
  • Total: ~1.6 GB

Best Practices

  1. Normalize embeddings when using angular distance
  2. Use appropriate distance metrics for your embedding model
  3. Start with default HNSW parameters and tune based on metrics
  4. Monitor recall metrics to ensure search quality
  5. Combine with filters for better result relevance

Next Steps

Build docs developers (and LLMs) love