Schemas

Schemas define the structure of your documents and how they should be processed, indexed, and ranked in Vespa. They are written in the schema definition language using .sd files.

What is a Schema?

A schema is a declarative specification that defines:

Document structure (fields and types)
Indexing behavior (how fields are processed)
Search configuration (how queries match documents)
Ranking profiles (how results are scored)

Schemas were previously called “search definitions”, which is why the file extension is .sd.

Basic Schema Structure

Here’s a simple schema example:

music.sd

schema music {
    document music {
        field title type string {
            indexing: index | summary
        }
        
        field artist type string {
            indexing: index | summary
        }
        
        field year type int {
            indexing: summary | attribute
        }
    }
}

This example is similar to schemas found in the Vespa codebase

Schema Components

Document Block

The document block defines the fields that are stored:

schema product {
    document product {
        field title type string {
            indexing: index | summary
        }
        
        field price type double {
            indexing: summary | attribute
        }
        
        field in_stock type bool {
            indexing: attribute
        }
    }
}

The complex fields example is inspired by msmarco.sd:4-49

Field Configuration

Each field has several configuration options:

Indexing Directive

Controls how the field is processed:

field title type string {
    indexing: index | summary
}

index - Create reverse index for text search
summary - Include in search results
attribute - Create forward index for ranking, sorting, grouping

Multiple directives are combined with |

Index Configuration

Additional indexing options:

field content type string {
    indexing: index | summary
    index: enable-bm25
    stemming: best
}

enable-bm25 - Enable BM25 text ranking
enable-embedding - Enable semantic search

Summary Configuration

Control how field appears in results:

field body type string {
    indexing: index | summary
    summary: dynamic
}

dynamic - Generate snippets with query term highlighting
static - Return full field content

Stemming

Configure linguistic processing:

field title type string {
    indexing: index
    stemming: best
}

best - Use best available stemmer for language
none - No stemming
shortest, multiple - Stemming variants

Indexing Language

The indexing directive uses a powerful expression language for field processing:

Indexing Expressions

field title type string {
    indexing: input title | index | summary
}

The indexing language is implemented in the indexinglanguage module.

Attributes

Attributes are forward indexes that enable fast access to field values during ranking and grouping.

When to Use Attributes

Ranking

Fields used in rank expressions

Grouping

Fields used for aggregation

Sorting

Fields used to order results

Filtering

Numeric or boolean filters

Attribute Configuration

field price type double {
    indexing: summary | attribute
    attribute: fast-search
}

field tags type array<string> {
    indexing: summary | attribute
    attribute: fast-search
}

field user_vector type tensor<float>(x[128]) {
    indexing: attribute | index
    attribute {
        distance-metric: angular
    }
    index {
        hnsw {
            max-links-per-node: 16
            neighbors-to-explore-at-insert: 200
        }
    }
}

Attribute configuration patterns from Vespa schema implementation

Field Sets

Field sets group fields for convenient searching:

schema article {
    document article {
        field title type string {
            indexing: index | summary
        }
        
        field body type string {
            indexing: index | summary
        }
    }
    
    fieldset default {
        fields: title, body
    }
}

Example from msmarco.sd:57 Now queries automatically search both fields:

select * from article where default contains "vespa"

Tensor Fields

Tensor fields store multi-dimensional arrays, essential for machine learning:

field title_embedding type tensor<float>(x[768]) {
    indexing: attribute
}

field image_features type tensor<float>(x[512]) {
    indexing: attribute | index
    attribute {
        distance-metric: euclidean
    }
    index {
        hnsw {
            max-links-per-node: 16
            neighbors-to-explore-at-insert: 100
        }
    }
}

Based on tensor fields in msmarco.sd:27-49

Tensor fields can be indexed with HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search.

Document Summaries

Control which fields are returned in search results:

schema product {
    document product {
        field id type string {
            indexing: summary | attribute
        }
        
        field title type string {
            indexing: index | summary
        }
        
        field description type string {
            indexing: index | summary
        }
    }
    
    document-summary minimal {
        summary id type string {}
        summary title type string {}
    }
    
    document-summary full {
        summary id type string {}
        summary title type string {}
        summary description type string {}
    }
}

Document summary pattern from msmarco.sd:53

Schema Inheritance

Schemas can inherit from other schemas:

schema base_document {
    document base_document {
        field created_time type long {
            indexing: summary | attribute
        }
        
        field modified_time type long {
            indexing: summary | attribute
        }
    }
}

schema article inherits base_document {
    document article inherits base_document {
        field title type string {
            indexing: index | summary
        }
    }
}

Real-World Example

Here’s a complete schema for a search application:

msmarco.sd

schema msmarco {
    document msmarco {
        field id type string {
            indexing: summary | attribute
        }

        field title type string {
            indexing: index | summary
            index: enable-bm25
            stemming: best
        }

        field body type string {
            indexing: index | summary
            index: enable-bm25
            summary: dynamic
            stemming: best
        }

        field title_embedding type tensor<float>(x[768]) {
            indexing: attribute
        }

        field body_embedding type tensor<float>(x[768]) {
            indexing: attribute
        }
    }

    fieldset default {
        fields: title, body
    }

    rank-profile bm25 {
        first-phase {
            expression: bm25(title) + bm25(body)
        }
    }

    rank-profile semantic {
        function title_similarity() {
            expression: sum(query(query_embedding) * attribute(title_embedding))
        }
        
        first-phase {
            expression: title_similarity()
        }
    }
}

Simplified from msmarco.sd

Schema Processing Implementation

Schemas are processed by the config model:

Parse Schema File

Schema definition is parsed into an AST

Build Document Type

Document type structure is created from schema

Generate Configurations

Index, attribute, and rank configurations are generated

Deploy to Nodes

Configurations are distributed to content and container nodes

Key Modules:

config-model - Schema parsing and validation
indexinglanguage - Indexing expression execution

Best Practices

Index Text Fields

Use index for fields you’ll search with text queries

Attribute Numeric Fields

Use attribute for numeric fields used in ranking or filtering

Enable BM25

Add index: enable-bm25 for better text ranking

Use Tensors for ML

Store embeddings as tensor fields for semantic search

Next Steps

Search

Learn how search works

Ranking

Configure ranking profiles

Tensors

Work with tensor fields

Get Started

Core Concepts

Search & Query

Data Operations

Machine Learning

Configuration & Deployment

Performance & Operations

What is a Schema?

Basic Schema Structure

Schema Components

Document Block

Field Configuration

Indexing Language

Indexing Expressions

Attributes

When to Use Attributes

Ranking

Grouping

Sorting

Filtering

Attribute Configuration

Field Sets

Tensor Fields

Document Summaries

Schema Inheritance

Real-World Example

Schema Processing Implementation

Best Practices

Index Text Fields

Attribute Numeric Fields

Enable BM25

Use Tensors for ML

Next Steps

Search

Ranking

Tensors

Build docs developers (and LLMs) love

Get Started

Core Concepts

Search & Query

Data Operations

Machine Learning

Configuration & Deployment

Performance & Operations

​What is a Schema?

​Basic Schema Structure

​Schema Components

​Document Block

​Field Configuration

​Indexing Language

​Indexing Expressions

​Attributes

​When to Use Attributes

Ranking

Grouping

Sorting

Filtering

​Attribute Configuration

​Field Sets

​Tensor Fields

​Document Summaries

​Schema Inheritance

​Real-World Example

​Schema Processing Implementation

​Best Practices

Index Text Fields

Attribute Numeric Fields

Enable BM25

Use Tensors for ML

​Next Steps

Search

Ranking

Tensors

Build docs developers (and LLMs) love

What is a Schema?

Basic Schema Structure

Schema Components

Document Block

Field Configuration

Indexing Language

Indexing Expressions

Attributes

When to Use Attributes

Attribute Configuration

Field Sets

Tensor Fields

Document Summaries

Schema Inheritance

Real-World Example

Schema Processing Implementation

Best Practices

Next Steps