Introduction to Vespa

Vespa is an open-source platform for building applications that need to search, make inferences in, and organize vectors, tensors, text, and structured data at serving time and any scale.

What is Vespa?

Vespa combines the capabilities of a search engine, vector database, and machine learning inference platform into a single unified system. It enables you to select subsets of data from large corpora, evaluate machine-learned models over the selected data, organize and aggregate results, and return them in milliseconds — all while your data corpus is continuously changing.

Vespa has been in development for many years and powers several large internet services and applications that serve hundreds of thousands of queries per second.

Key Capabilities

Vespa provides a comprehensive set of features for building modern AI-powered applications:

Text Search

Full-text search with BM25 ranking, stemming, and linguistic processing

Vector Search

Native support for dense and sparse vectors with approximate nearest neighbor search

Structured Data

Store and query structured data with powerful filtering and aggregation

ML Inference

Deploy and serve machine learning models with sub-millisecond latency

Use Cases

Search Applications

Build high-performance search engines that combine traditional text search with modern vector-based semantic search. Vespa supports:

Full-text search with linguistic processing
BM25 and other text ranking algorithms
Faceted search and filtering
Real-time indexing and updates

Recommendation Systems

Create personalized recommendation engines that evaluate machine-learned models over large item catalogs:

Real-time personalization based on user context
Content-based and collaborative filtering
Multi-stage ranking with ML models
A/B testing and experimentation

Retrieval Augmented Generation (RAG)

Power AI applications with semantic search and vector retrieval:

Store and search document embeddings
Hybrid search combining text and vectors
Fast nearest neighbor search at scale
Integration with LLM workflows

Real-Time Analytics

Perform aggregations and analytics over continuously changing datasets:

Grouping and aggregation queries
Time-series data analysis
Real-time dashboards and metrics

Architecture Overview

Vespa’s architecture is designed for high availability, performance, and scalability:

Stateless Container Layer

The container layer handles:

Query processing and execution logic
Document feed processing
ML model inference
Custom application components

Implemented in Java, the container layer is horizontally scalable and provides:

Query parsing and transformation
Result processing and formatting
Federation across multiple content clusters
Custom request handlers and searchers

Source: container-search

Content Nodes

Content nodes store data and perform distributed operations:

Document storage with forward and reverse indexes
Distributed matching and ranking
Grouping and aggregation
Real-time updates

Implemented in C++, content nodes provide:

High-performance matching over billions of documents
Feature evaluation and ranking
Elastic, auto-recovering storage
Multi-threaded query execution

Source: searchcore, searchlib

Configuration System

The configuration system manages:

Application deployment
Configuration distribution to nodes
Cluster management
Health monitoring

Source: configserver

Core Concepts

Schemas

Schemas define the structure of your documents, including fields, indexing, and ranking. Here’s a simple example:

schema music {
    document music {
        field artist type string {
            indexing: index | summary
            index: enable-bm25
        }
        field title type string {
            indexing: index | summary
        }
    }
    
    fieldset default {
        fields: artist, title
    }
    
    rank-profile default {
        first-phase {
            expression: nativeRank(artist, title)
        }
    }
}

Tensors and Vectors

Vespa has native support for tensors, enabling vector search and ML inference:

field embedding type tensor<float>(x[768]) {
    indexing: attribute | index
    index: hnsw
}

Ranking

Ranking expressions define how documents are scored. Vespa supports multi-phase ranking:

rank-profile hybrid {
    first-phase {
        expression: bm25(title) + bm25(body)
    }
    second-phase {
        expression: sum(query(q_embed) * attribute(doc_embed))
    }
}

Why Vespa?

Performance at Scale

Vespa is built for production workloads, handling hundreds of thousands of queries per second across billions of documents with sub-100ms latency.

Real-Time Operations

Updates are immediately available for queries. No batch processing or reindexing required.

Unified Platform

Combine text search, vector search, structured data, and ML inference in a single platform instead of stitching together multiple systems.

Production Ready

High availability, automatic data distribution, self-healing clusters, and comprehensive monitoring built-in.

Open Source

Apache 2.0 licensed with active development. All code is available at github.com/vespa-engine/vespa.

Next Steps

Quickstart

Get Vespa running locally in minutes

Installation

Learn about installation options

Sample Applications

Explore example applications

Documentation

Deep dive into Vespa features

Community and Support

Slack: Join the Vespa community on Slack
GitHub: Report issues and contribute at vespa-engine/vespa
Blog: Read the Vespa Blog for updates and use cases
Cloud: Try Vespa Cloud for managed hosting

Get Started

Core Concepts

Search & Query

Data Operations

Machine Learning

Configuration & Deployment

Performance & Operations

Introduction to Vespa

What is Vespa?

Key Capabilities

Text Search

Vector Search

Structured Data

ML Inference

Use Cases

Search Applications

Recommendation Systems

Retrieval Augmented Generation (RAG)

Real-Time Analytics

Architecture Overview

Stateless Container Layer

Content Nodes

Configuration System

Core Concepts

Schemas

Tensors and Vectors

Ranking

Why Vespa?

Next Steps

Quickstart

Installation

Sample Applications

Documentation

Community and Support

Build docs developers (and LLMs) love

Get Started

Core Concepts

Search & Query

Data Operations

Machine Learning

Configuration & Deployment

Performance & Operations

​What is Vespa?

​Key Capabilities

Text Search

Vector Search

Structured Data

ML Inference

​Use Cases

​Search Applications

​Recommendation Systems

​Retrieval Augmented Generation (RAG)

​Real-Time Analytics

​Architecture Overview

​Stateless Container Layer

​Content Nodes

​Configuration System

​Core Concepts

​Schemas

​Tensors and Vectors

​Ranking

​Why Vespa?

​Next Steps

Quickstart

Installation

Sample Applications

Documentation

​Community and Support

Build docs developers (and LLMs) love

What is Vespa?

Key Capabilities

Use Cases

Search Applications

Recommendation Systems

Retrieval Augmented Generation (RAG)

Real-Time Analytics

Architecture Overview

Stateless Container Layer

Content Nodes

Configuration System

Core Concepts

Schemas

Tensors and Vectors

Ranking

Why Vespa?

Next Steps

Community and Support