Skip to main content
Vespa consists of approximately 1.7 million lines of code, split equally between Java and C++. This guide maps the functional elements of Vespa to the most important modules in the flat structure of about 150 modules.
The code is written by a team selected for their ability to do this work unusually well, with time to dedicate to it long-term. While the code is mostly easy to work with, the module structure wasn’t designed to be newcomer-friendly - it’s simply organized in a flat structure.

Architecture Overview

Vespa’s architecture consists of three major subsystems:

Stateless Container

Request handling, query processing, and document operations (Java)

Content Nodes

Data storage, indexing, matching, and ranking (C++)

Configuration

Application deployment, configuration management, and administration (Java)
Code Map

The Stateless Container

When a request enters Vespa, it first goes through a stateless container cluster called jDisc. The container is implemented entirely in Java and consists of multiple layers:

jDisc Core

Provides the foundation for request-response handling:

jdisc_core

Core jDisc functionality providing:
  • Application model for running services
  • Protocol-independent request-response handling
  • Various protocol implementations
  • Network I/O abstraction

jDisc Container

Layered on jDisc core, providing component infrastructure:

container-disc

Core container functionality:
  • Metrics collection
  • OSGi integration for component bundles
  • Dependency injection
  • HTTP connector
  • Integration between container and core layers

component

The component model - Java components implement or subclass types from this module

Search Container

Layered on jDisc container for query processing:

container-search

Query processing framework including:
  • Query-Result processing (Searchers)
  • Generic processing framework
  • Query profiles
  • Global query execution logic
  • Dispatch (scatter-gather)
  • Grouping and aggregation

Document Operation Modules

Handling document writes and updates:
Location: vespa/documentThe document model implemented in both Java and C++:
  • Documents, fields, and document types
  • Operations on documents (put, update, remove)
  • Document serialization
Location: vespa/messagebusGeneric async, multi-hop message passing:
  • Implemented in both Java and C++
  • Reliable message routing
  • Load balancing across nodes
  • Throttling and flow control
Location: vespa/documentapiAPI for issuing document operations to Vespa over messagebus:
  • Document put, update, remove operations
  • Visiting (bulk read) API
  • Batch operations
Location: vespa/docprocChainable document processors:
  • Process documents before indexing
  • Transform, enrich, or validate documents
  • Custom document processing logic
Location: vespa/indexinglanguageImplementation of the “indexing” language:
  • Expressions used in schema indexing: statements
  • Field transformations
  • Derived field computation
Location: vespa/docprocsDocument processor components bundled with Vespa:
  • IndexingProcessor - Executes indexing language statements
  • Standard document transformations
Location: vespa/vespaclient-container-pluginImplements HTTP APIs for document operations:
  • /document/v1/ REST API
  • Internal API used by Java HTTP client
  • Forwards to Document API
Location: vespa/vespa-feed-clientHigh-performance client for writing documents:
  • Async, pipelined writes
  • Automatic retries and throttling
  • Uses internal API for optimal performance

Content Nodes

Content nodes store all data, maintain indexes, and perform distributed query execution. This subsystem is written in C++ for maximum performance.

Core Content Components

Location: vespa/searchcoreCore functionality for content nodes:
  • Proton - The content node server itself
  • Index maintenance (real-time indexing)
  • Matching (document selection)
  • Data storage and retrieval
  • Grouping and aggregation
  • Document-level operations
Location: vespa/searchlibLibraries invoked by searchcore:Ranking:
  • Feature execution framework (fef)
  • Rank feature implementations
  • Ranking expression evaluation
Indexing:
  • Index implementations
  • B-tree structures
  • Attributes (forward indexes)
Java components:
  • Java ranking libraries
  • Query tree representations
Location: vespa/storageElastic, auto-recovering data storage:
  • Distribution across cluster nodes
  • Bucket management
  • Replica maintenance
  • Consistency guarantees
  • Garbage collection
Location: vespa/evalEfficient evaluation of ranking expressions:
  • Tensor API and implementation
  • Expression optimization
  • ONNX model integration
  • SIMD and GPU acceleration
Location: vespa/storageapiMessageBus messages for storage:
  • Document API protocol implementation
  • Storage operation messages
  • Internal storage communication
Location: vespa/clustercontroller-coreCluster controller for storage (Java):
  • Node-level decision-making
  • State management via ZooKeeper
  • Cluster health monitoring
  • Automatic failover

Configuration and Administration

The third major subsystem manages configuration, clusters, and application deployment. Implemented in Java.

Configuration System

Location: vespa/configserverThe server where applications are deployed:
  • Application package deployment
  • Configuration generation
  • Serving config to nodes
  • Application lifecycle management
Location: vespa/config-modelModel of the running system:
  • Derives configuration from application package
  • Returns config instances by type and id
  • Validates application structure
  • Manages service topology
Location: vespa/configClient-side configuration library (Java and C++):
  • Subscribing to configs by type and id
  • Reading config payloads
  • Automatic config updates
  • Config caching
Location: vespa/configgenCode generation for configs:
  • Generates C++ config classes
  • Generates Java config classes
  • Type-safe config reading and building
Location: vespa/config-proxyNode-local config proxy:
  • Caches configs on each node
  • Reduces config server load
  • Provides config during restarts
Location: vespa/configdefinitionsShared config type definitions:
  • Config .def files
  • Referenced by multiple modules
  • System-wide configuration schemas

General Utility Libraries

Libraries used throughout the Vespa codebase:

vespalib

General utility library for C++:
  • Data structures (hash maps, arrays)
  • Threading and synchronization
  • Memory management
  • String utilities
  • Network utilities

vespajlib

General utility library for Java:
  • Collections and data structures
  • Java tensor implementation
  • Text processing
  • Utilities and helpers

Finding Your Way

When working with the Vespa codebase:
1

Identify the functional area

Determine which subsystem your change affects:
  • Query processing → Container modules
  • Indexing/ranking → Content node modules
  • Configuration/deployment → Config system
2

Find the relevant module

Use this code map to identify the specific module:
  • Module names usually indicate their purpose
  • Check module README files for details
  • Look at OWNERS files for experts
3

Explore the module structure

Within each module:
  • src/main/ or src/vespa/ - Production code
  • src/test/ or src/tests/ - Test code
  • README.md - Module documentation
  • CMakeLists.txt or pom.xml - Build configuration

Module Categories

Modules follow naming patterns:
PatternPurposeExamples
container-*Container componentscontainer-search, container-disc
config*Configuration systemconfigserver, config-model
search*Search and rankingsearchcore, searchlib
*-pluginBuild pluginsbundle-plugin, config-class-plugin
vespa*Core utilitiesvespalib, vespajlib
jdisc*jDisc frameworkjdisc_core, jdisc_http_service

Additional Resources

TODO List

Larger features nobody is working on yet

Module READMEs

Each module has detailed documentation in its README.md file

OWNERS Files

Find subject matter experts for code areas

Code Search

Search the codebase on GitHub

Not Covered Here

This map focuses on modules you’re most likely to encounter as a developer. Other modules are either:
  • Small and self-explanatory
  • Implementing specific technical requirements
  • Part of the Vespa Cloud service (not expected to be modified externally)
For a complete list, browse the Vespa repository.

Next Steps

Building Vespa

Build the modules you want to work on

Running Tests

Test your changes

Development Overview

Return to development overview

Contributing

Learn the contribution process

Build docs developers (and LLMs) love