The code is written by a team selected for their ability to do this work unusually well, with time to dedicate to it long-term. While the code is mostly easy to work with, the module structure wasn’t designed to be newcomer-friendly - it’s simply organized in a flat structure.
Architecture Overview
Vespa’s architecture consists of three major subsystems:Stateless Container
Request handling, query processing, and document operations (Java)
Content Nodes
Data storage, indexing, matching, and ranking (C++)
Configuration
Application deployment, configuration management, and administration (Java)
The Stateless Container
When a request enters Vespa, it first goes through a stateless container cluster called jDisc. The container is implemented entirely in Java and consists of multiple layers:jDisc Core
Provides the foundation for request-response handling:jdisc_core
Core jDisc functionality providing:
- Application model for running services
- Protocol-independent request-response handling
- Various protocol implementations
- Network I/O abstraction
jDisc Container
Layered on jDisc core, providing component infrastructure:container-disc
Core container functionality:
- Metrics collection
- OSGi integration for component bundles
- Dependency injection
- HTTP connector
- Integration between container and core layers
component
The component model - Java components implement or subclass types from this module
Search Container
Layered on jDisc container for query processing:container-search
Query processing framework including:
- Query-Result processing (Searchers)
- Generic processing framework
- Query profiles
- Global query execution logic
- Dispatch (scatter-gather)
- Grouping and aggregation
Document Operation Modules
Handling document writes and updates:document
document
Location: vespa/documentThe document model implemented in both Java and C++:
- Documents, fields, and document types
- Operations on documents (put, update, remove)
- Document serialization
messagebus
messagebus
Location: vespa/messagebusGeneric async, multi-hop message passing:
- Implemented in both Java and C++
- Reliable message routing
- Load balancing across nodes
- Throttling and flow control
documentapi
documentapi
Location: vespa/documentapiAPI for issuing document operations to Vespa over messagebus:
- Document put, update, remove operations
- Visiting (bulk read) API
- Batch operations
docproc
docproc
Location: vespa/docprocChainable document processors:
- Process documents before indexing
- Transform, enrich, or validate documents
- Custom document processing logic
indexinglanguage
indexinglanguage
Location: vespa/indexinglanguageImplementation of the “indexing” language:
- Expressions used in schema
indexing:statements - Field transformations
- Derived field computation
docprocs
docprocs
Location: vespa/docprocsDocument processor components bundled with Vespa:
- IndexingProcessor - Executes indexing language statements
- Standard document transformations
vespaclient-container-plugin
vespaclient-container-plugin
Location: vespa/vespaclient-container-pluginImplements HTTP APIs for document operations:
/document/v1/REST API- Internal API used by Java HTTP client
- Forwards to Document API
vespa-feed-client
vespa-feed-client
Location: vespa/vespa-feed-clientHigh-performance client for writing documents:
- Async, pipelined writes
- Automatic retries and throttling
- Uses internal API for optimal performance
Content Nodes
Content nodes store all data, maintain indexes, and perform distributed query execution. This subsystem is written in C++ for maximum performance.Core Content Components
searchcore
searchcore
Location: vespa/searchcoreCore functionality for content nodes:
- Proton - The content node server itself
- Index maintenance (real-time indexing)
- Matching (document selection)
- Data storage and retrieval
- Grouping and aggregation
- Document-level operations
searchlib
searchlib
Location: vespa/searchlibLibraries invoked by searchcore:Ranking:
- Feature execution framework (fef)
- Rank feature implementations
- Ranking expression evaluation
- Index implementations
- B-tree structures
- Attributes (forward indexes)
- Java ranking libraries
- Query tree representations
storage
storage
Location: vespa/storageElastic, auto-recovering data storage:
- Distribution across cluster nodes
- Bucket management
- Replica maintenance
- Consistency guarantees
- Garbage collection
eval
eval
Location: vespa/evalEfficient evaluation of ranking expressions:
- Tensor API and implementation
- Expression optimization
- ONNX model integration
- SIMD and GPU acceleration
storageapi
storageapi
Location: vespa/storageapiMessageBus messages for storage:
- Document API protocol implementation
- Storage operation messages
- Internal storage communication
clustercontroller-core
clustercontroller-core
Location: vespa/clustercontroller-coreCluster controller for storage (Java):
- Node-level decision-making
- State management via ZooKeeper
- Cluster health monitoring
- Automatic failover
Configuration and Administration
The third major subsystem manages configuration, clusters, and application deployment. Implemented in Java.Configuration System
configserver
configserver
Location: vespa/configserverThe server where applications are deployed:
- Application package deployment
- Configuration generation
- Serving config to nodes
- Application lifecycle management
config-model
config-model
Location: vespa/config-modelModel of the running system:
- Derives configuration from application package
- Returns config instances by type and id
- Validates application structure
- Manages service topology
config
config
Location: vespa/configClient-side configuration library (Java and C++):
- Subscribing to configs by type and id
- Reading config payloads
- Automatic config updates
- Config caching
configgen
configgen
Location: vespa/configgenCode generation for configs:
- Generates C++ config classes
- Generates Java config classes
- Type-safe config reading and building
config-proxy
config-proxy
Location: vespa/config-proxyNode-local config proxy:
- Caches configs on each node
- Reduces config server load
- Provides config during restarts
configdefinitions
configdefinitions
Location: vespa/configdefinitionsShared config type definitions:
- Config
.deffiles - Referenced by multiple modules
- System-wide configuration schemas
General Utility Libraries
Libraries used throughout the Vespa codebase:vespalib
General utility library for C++:
- Data structures (hash maps, arrays)
- Threading and synchronization
- Memory management
- String utilities
- Network utilities
vespajlib
General utility library for Java:
- Collections and data structures
- Java tensor implementation
- Text processing
- Utilities and helpers
Finding Your Way
When working with the Vespa codebase:Identify the functional area
Determine which subsystem your change affects:
- Query processing → Container modules
- Indexing/ranking → Content node modules
- Configuration/deployment → Config system
Find the relevant module
Use this code map to identify the specific module:
- Module names usually indicate their purpose
- Check module README files for details
- Look at
OWNERSfiles for experts
Module Categories
Modules follow naming patterns:| Pattern | Purpose | Examples |
|---|---|---|
container-* | Container components | container-search, container-disc |
config* | Configuration system | configserver, config-model |
search* | Search and ranking | searchcore, searchlib |
*-plugin | Build plugins | bundle-plugin, config-class-plugin |
vespa* | Core utilities | vespalib, vespajlib |
jdisc* | jDisc framework | jdisc_core, jdisc_http_service |
Additional Resources
TODO List
Larger features nobody is working on yet
Module READMEs
Each module has detailed documentation in its README.md file
OWNERS Files
Find subject matter experts for code areas
Code Search
Search the codebase on GitHub
Not Covered Here
This map focuses on modules you’re most likely to encounter as a developer. Other modules are either:- Small and self-explanatory
- Implementing specific technical requirements
- Part of the Vespa Cloud service (not expected to be modified externally)
Next Steps
Building Vespa
Build the modules you want to work on
Running Tests
Test your changes
Development Overview
Return to development overview
Contributing
Learn the contribution process