Skip to main content
Documents are the fundamental unit of data in Vespa. Each document represents an instance of an entity you want to store and search, such as an article, product, or user profile.

What is a Document?

A document is an identifiable set of value bindings of a document type. Think of it as a structured record with:
  • A unique document ID
  • A document type defining its structure
  • A set of fields containing values
Documents in Vespa are implemented in both Java and C++, allowing them to be used throughout the system from the container layer to content nodes.

Document Structure

Here’s the Java implementation showing the core document structure:
package com.yahoo.document;

public class Document extends StructuredFieldValue {
    private DocumentId docId;
    private Struct content;
    private Long lastModified = null;
    
    /**
     * Create a document with the given document type and identifier.
     */
    public Document(DocumentType docType, String id) {
        this(docType, new DocumentId(id));
    }
    
    public DocumentId getId() { 
        return docId; 
    }
    
    public FieldValue getFieldValue(Field field) {
        return content.getFieldValue(field);
    }
}
Source: document/src/main/java/com/yahoo/document/Document.java:41

Document ID

Every document must have a unique identifier. Document IDs follow this format:
id:<namespace>:<document-type>:<key-value>

Examples

id:music:song::love-is-here-to-stay
Document IDs are immutable. To change a document’s ID, you must delete the old document and create a new one.

Document Types

A document type defines the structure and fields of documents. It’s similar to a table schema in relational databases.
public class DocumentType extends StructuredDataType {
    private StructDataType contentStructType;
    private List<DocumentType> inherits = new ArrayList<>(1);
    
    public DocumentType(String name) {
        this(name, createContentStructType(name));
    }
    
    public void addField(Field field) {
        if (isRegistered()) {
            throw new IllegalStateException(
                "You cannot add fields to a document type that is already registered."
            );
        }
        contentStructType.addField(field);
    }
}
Source: document/src/main/java/com/yahoo/document/DocumentType.java:34

Document Type Features

Inheritance

Document types can inherit from other types

Field Sets

Group fields for efficient partial updates

Struct Types

Nested structured data within documents

Imported Fields

Reference fields from other document types

Fields

Fields are the individual data elements within a document. Each field has:
  • A name (identifier)
  • A data type (string, int, tensor, etc.)
  • Optional attributes for indexing and storage behavior

Field Types

Basic scalar values:
  • string - Text data
  • int, long - Integer numbers
  • float, double - Floating-point numbers
  • bool - Boolean values
  • byte - Single byte values
Structured data:
  • array<T> - Ordered collection of values
  • weightedset<T> - Set with associated weights
  • map<K,V> - Key-value pairs
  • tensor<T>(dimensions) - Multi-dimensional arrays
  • struct - Custom structured types
Links to other documents:
  • reference<document-type> - Link to another document

Working with Documents

Creating Documents

import com.yahoo.document.*;

DocumentType docType = new DocumentType("music");
Document doc = new Document(docType, "id:music:music::song-1");

// Set field values
Field titleField = docType.getField("title");
doc.setFieldValue(titleField, new StringFieldValue("Love is Here to Stay"));

Reading Documents

public FieldValue getFieldValue(Field field) {
    return content.getFieldValue(field);
}

public Field getField(String fieldName) {
    Field field = content.getField(fieldName);
    if (field == null) {
        for(DocumentType parent : getDataType().getInheritedTypes()) {
            field = parent.getField(fieldName);
            if (field != null) break;
        }
    }
    return field;
}
Source: Document.java:165-176

Document Operations

1

PUT

Create or completely replace a document
2

UPDATE

Modify specific fields without reading the entire document
3

GET

Retrieve a document by its ID
4

DELETE

Remove a document from the system

Document Serialization

Vespa supports multiple serialization formats:

JSON Format

public String toJson() {
    ByteArrayOutputStream buffer = new ByteArrayOutputStream();
    JsonWriter writer = new JsonWriter(buffer);
    writer.write(this);
    return buffer.toString(StandardCharsets.UTF_8);
}
Source: Document.java:237
{
  "put": "id:music:song::love-is-here-to-stay",
  "fields": {
    "title": "Love is Here to Stay",
    "artist": "Ella Fitzgerald",
    "year": 1959,
    "duration": 210
  }
}

Document Inheritance

Document types support inheritance for code reuse:
public void inherit(DocumentType type) {
    verifyTypeConsistency(type);
    if (isRegistered()) {
        throw new IllegalStateException(
            "You cannot add inheritance to a document type that is already registered."
        );
    }
    
    // If it inherits the exact same type
    if (inherits.contains(type)) return;
    
    inherits.add(type);
    for (var field : type.getAllUniqueFields()) {
        if (!contentStructType.hasField(field)) {
            contentStructType.addField(field);
        }
    }
}
Source: DocumentType.java:267
When a document type inherits from another, it cannot change the type of inherited fields - this ensures type consistency.

Best Practices

Choose IDs Carefully

Use meaningful, stable identifiers that won’t change

Plan Field Types

Select appropriate data types for your use case

Use Inheritance

Share common fields across related document types

Partial Updates

Update only changed fields for better performance

Document Module Reference

The document module contains the core document implementation:
  • Module: document
  • Language: Java and C++
  • Key Classes:
    • Document - Main document class
    • DocumentType - Document type definition
    • Field - Field definition
    • DocumentId - Document identifier
    • FieldValue - Field value abstraction

Next Steps

Schemas

Define document structures in schema files

Document Operations

Use the Document API

Indexing

Learn about indexing fields

Build docs developers (and LLMs) love