Search

Vespa search is a distributed, two-phase process that efficiently finds and ranks documents at scale. Understanding how search works helps you optimize performance and relevance.

Search Architecture

Search in Vespa involves coordination between the stateless container layer and content nodes:

The Two-Phase Process

Vespa uses a two-phase search process to efficiently handle large-scale queries:

Matching Phase

Find all documents matching the query criteria

Ranking Phase

Score and sort matched documents

Why Two Phases?

Efficiency

Avoid expensive ranking calculations on documents that don’t match

Scalability

Distribute work across content nodes

Flexibility

Use different ranking strategies per phase

Performance

Rank only the most promising candidates

Matching Phase

The matching phase identifies which documents satisfy the query. This happens on each content node independently.

Query Types

Text Search

Full-text search using inverted indexes:

select * from article where title contains "vespa"

Uses reverse indexes built during document indexing
Supports stemming, phrase matching, and linguistic processing
Implemented in searchlib

Structured Queries

Filtering on attributes:

select * from product where price < 100 and in_stock = true

Uses attribute (forward) indexes
Fast numeric and boolean comparisons
Supports range queries

Vector Search

Approximate nearest neighbor search:

select * from article where {targetHits:10}nearestNeighbor(embedding, query_embedding)

Uses HNSW index for efficient ANN
Returns approximate top-k results
Configurable accuracy vs speed tradeoff

Hybrid Queries

Combine multiple query types:

select * from article 
where 
    userQuery() 
    and published_date > 1609459200
    and {targetHits:20}nearestNeighbor(embedding, query_embedding)

Combines text, structured, and vector search
Efficient query execution with multiple indexes

Query Execution

Query execution is implemented in the searchlib module: Key Components:

Module: searchlib/src/vespa/searchlib/queryeval
Search iterators for different query operators (AND, OR, NOT, etc.)
Blueprint pattern for query optimization
Lazy evaluation for efficiency

Searchlib implements the core matching algorithms used by Proton (the content node server).

Query Language

Vespa supports two primary query languages:

YQL (Vespa Query Language)

SQL-like syntax for queries:

select * from music where title contains "love"

Simple Query Language

Simpler syntax for basic queries:

query=laptop&filter=price:<1000

Matching Operators

Vespa provides various operators for text matching:

contains

Basic term matching:

where title contains "vespa"

phrase

Exact phrase matching:

where title contains phrase("search", "engine")

near

Terms within a distance:

where title contains near("search", "engine")

onear

Ordered terms within a distance:

where title contains onear("search", "engine")

equiv

Match any of several terms:

where title contains equiv("car", "automobile", "vehicle")

weakAnd

Efficient OR with many terms:

where title contains weakAnd("machine", "learning", "AI", "neural")

Dispatching and Distribution

The container layer coordinates query execution across content nodes:

Scatter-Gather Pattern

Query Dispatch

Container sends query to all content nodes covering the data

Parallel Execution

Each content node executes the query on its data partition

Partial Results

Each node returns its top-k results

Result Merging

Container merges results into final ranking

Implementation: container-search module handles query dispatch and result aggregation.

Search Performance

Index Types and Performance

Reverse Index

Fast text search on indexed fields

Attribute Index

Fast filtering and ranking on attributes

HNSW Index

Fast approximate nearest neighbor search

B-tree Index

Fast-search on string attributes

Query Optimization

Query Rewriting

Vespa optimizes queries before execution:

Combining similar terms
Eliminating redundant clauses
Choosing optimal execution order

Early Termination

Stop searching when enough results are found:

Set hits parameter for top-k queries
Use targetHits for approximate search
Combine with ranking thresholds

Parallel Execution

Leverage multiple content nodes:

Data is automatically partitioned
Queries execute in parallel
Linear scalability with more nodes

Filtering vs Searching

Understanding the difference is key to performance:

Filters

Filters use attributes for fast exact matching:

where price < 100 and category = "electronics"

Fast evaluation using forward indexes
No text processing overhead
Efficient for numeric and boolean comparisons

Search uses reverse indexes for text matching:

where title contains "laptop"

Linguistic processing (stemming, tokenization)
Relevance scoring
Phrase and proximity matching

Combined Queries

Best performance comes from combining both:

select * from product 
where 
    title contains "laptop" and  -- Search
    price < 1000 and             -- Filter
    in_stock = true              -- Filter

Grouping and Aggregation

Vespa can group and aggregate results during search:

select * from product 
where category contains "electronics"
| all(group(brand) each(output(count())))

This returns counts per brand without retrieving all documents. Implementation: Grouping happens on content nodes before results are sent to the container, minimizing data transfer.

Search Implementation Architecture

Content Node (Proton)

The Proton server handles search on content nodes:

Module: searchcore
Manages document storage and indexes
Executes matching and first-phase ranking
Returns top results to container

Search Library

Core search algorithms:

Module: searchlib
Query evaluation and matching
Index implementations (reverse, attribute, HNSW)
Ranking framework (discussed in Ranking concepts)

Real-World Query Example

Here’s a complete hybrid search query:

{
  "yql": "select * from article where userQuery() and {targetHits:10}nearestNeighbor(embedding, query_embedding)",
  "query": "machine learning",
  "ranking": "semantic_bm25_hybrid",
  "input.query(query_embedding)": [0.12, -0.45, 0.78, ...],
  "filter": "published_date > 1609459200",
  "hits": 20
}

This query:

Matches documents with “machine learning” text (BM25)
Finds nearest neighbors in embedding space (ANN)
Filters by publication date
Ranks using a hybrid profile
Returns top 20 results

Best Practices

Use Attributes for Filters

Mark filter fields as attribute in schema

Index Text Fields

Use index for fields you’ll search with text queries

Set targetHits

Control ANN search quality vs speed

Combine Query Types

Use hybrid queries for best relevance

Next Steps

Ranking

Learn how documents are scored

Schemas

Configure fields for search

Tensors

Use tensors for semantic search

Get Started

Core Concepts

Search & Query

Data Operations

Machine Learning

Configuration & Deployment

Performance & Operations

​Search Architecture

​The Two-Phase Process

​Why Two Phases?

Efficiency

Scalability

Flexibility

Performance

​Matching Phase

​Query Types

​Query Execution

​Query Language

​YQL (Vespa Query Language)

​Simple Query Language

​Matching Operators

​Dispatching and Distribution

​Scatter-Gather Pattern

​Search Performance

​Index Types and Performance

Reverse Index

Attribute Index

HNSW Index

B-tree Index

​Query Optimization

​Filtering vs Searching

​Filters

​Search

​Combined Queries

​Grouping and Aggregation

​Search Implementation Architecture

​Content Node (Proton)

​Search Library

​Real-World Query Example

​Best Practices

Use Attributes for Filters

Index Text Fields

Set targetHits

Combine Query Types

​Next Steps

Ranking

Schemas

Tensors

Build docs developers (and LLMs) love

Search Architecture

The Two-Phase Process

Why Two Phases?

Matching Phase

Query Types

Query Execution

Query Language

YQL (Vespa Query Language)

Simple Query Language

Matching Operators

Dispatching and Distribution

Scatter-Gather Pattern

Search Performance

Index Types and Performance

Query Optimization

Filtering vs Searching

Filters

Search

Combined Queries

Grouping and Aggregation

Search Implementation Architecture

Content Node (Proton)

Search Library

Real-World Query Example

Best Practices

Next Steps