System Overview
Vespa is built around three core subsystems:The Stateless Container
The stateless container layer is the entry point for all requests to Vespa. It’s built on the jDisc framework and consists entirely of Java components.Container Components
jDisc Core
jDisc Core
Provides the foundational request-response handling model:
- Protocol-independent request processing
- HTTP and other protocol implementations
- Application lifecycle management
jdisc_corejDisc Container
jDisc Container
Builds on jDisc core with component management:
- OSGi integration for component bundles
- Dependency injection framework
- Metrics and monitoring
- HTTP connector
container-disc, componentSearch Middleware
Search Middleware
Query and result processing:
- Query-Result processing framework (Searchers)
- Query execution logic and dispatch
- Scatter-gather across content nodes
- Grouping and aggregation coordination
container-searchDocument Operations
The container layer also handles document write operations:document- Document model and operationsmessagebus- Async multi-hop message passingdocproc- Document processing chainsindexinglanguage- Indexing language implementation
Content Nodes
Content nodes are where the data lives and where the heavy lifting of search happens. Written entirely in C++ for performance.Core Responsibilities
Data Storage
Persistent storage with automatic recovery and replication
Indexing
Maintains forward and reverse indexes in real-time
Matching
Finds documents matching the query criteria
Ranking
Scores documents using configurable rank profiles
Key Components
Proton - The content node server- Module:
searchcore - Core functionality for indexes, matching, storage, and grouping
- Module:
searchlib - Ranking framework (feature execution)
- Index and btree implementations
- Attributes (forward indexes)
- Java libraries for ranking
- Module:
storage - Elastic, auto-recovering data storage
- Distribution and replication across clusters
- Module:
eval - Efficient evaluation of ranking expressions
- Tensor API and operations
Configuration and Administration
The configuration system manages the entire Vespa deployment. Implemented in Java.Configuration Flow
Key Components
Config Server
Config Server
Central configuration management:
- Receives application deployments
- Serves configuration to all nodes
- Validates application packages
configserverConfig Model
Config Model
Models the running system:
- Processes application package into configs
- Returns config instances by type and ID
- Validates configuration consistency
config-modelConfig Client
Config Client
Node-side configuration:
- Subscribes to configs by type and ID
- Available in both Java and C++
- Automatic updates on config changes
config, config-proxyCode Map Reference
Vespa consists of approximately 1.7 million lines of code, split equally between Java and C++. The codebase is organized into about 150 modules in a flat structure.For a complete reference of all modules and their relationships, see the Code Map in the Vespa repository.
General Utility Libraries
Request Flow
Here’s how a typical search request flows through Vespa:Scalability and Distribution
Horizontal Scaling
Add more container or content nodes as needed
Data Distribution
Automatic sharding across content nodes
Replication
Configurable redundancy for high availability
Auto-Recovery
Automatic data redistribution on node failures
Next Steps
Documents
Learn about the document model
Schemas
Define your data structures
Search
Understand how search works