Skip to main content
Schemas define the structure of your documents and how they should be processed, indexed, and ranked in Vespa. They are written in the schema definition language using .sd files.

What is a Schema?

A schema is a declarative specification that defines:
  • Document structure (fields and types)
  • Indexing behavior (how fields are processed)
  • Search configuration (how queries match documents)
  • Ranking profiles (how results are scored)
Schemas were previously called “search definitions”, which is why the file extension is .sd.

Basic Schema Structure

Here’s a simple schema example:
music.sd
schema music {
    document music {
        field title type string {
            indexing: index | summary
        }
        
        field artist type string {
            indexing: index | summary
        }
        
        field year type int {
            indexing: summary | attribute
        }
    }
}
This example is similar to schemas found in the Vespa codebase

Schema Components

Document Block

The document block defines the fields that are stored:
schema product {
    document product {
        field title type string {
            indexing: index | summary
        }
        
        field price type double {
            indexing: summary | attribute
        }
        
        field in_stock type bool {
            indexing: attribute
        }
    }
}
The complex fields example is inspired by msmarco.sd:4-49

Field Configuration

Each field has several configuration options:
Controls how the field is processed:
field title type string {
    indexing: index | summary
}
  • index - Create reverse index for text search
  • summary - Include in search results
  • attribute - Create forward index for ranking, sorting, grouping
Multiple directives are combined with |
Additional indexing options:
field content type string {
    indexing: index | summary
    index: enable-bm25
    stemming: best
}
  • enable-bm25 - Enable BM25 text ranking
  • enable-embedding - Enable semantic search
Control how field appears in results:
field body type string {
    indexing: index | summary
    summary: dynamic
}
  • dynamic - Generate snippets with query term highlighting
  • static - Return full field content
Configure linguistic processing:
field title type string {
    indexing: index
    stemming: best
}
  • best - Use best available stemmer for language
  • none - No stemming
  • shortest, multiple - Stemming variants

Indexing Language

The indexing directive uses a powerful expression language for field processing:

Indexing Expressions

field title type string {
    indexing: input title | index | summary
}
The indexing language is implemented in the indexinglanguage module.

Attributes

Attributes are forward indexes that enable fast access to field values during ranking and grouping.

When to Use Attributes

Ranking

Fields used in rank expressions

Grouping

Fields used for aggregation

Sorting

Fields used to order results

Filtering

Numeric or boolean filters

Attribute Configuration

field price type double {
    indexing: summary | attribute
    attribute: fast-search
}

field tags type array<string> {
    indexing: summary | attribute
    attribute: fast-search
}

field user_vector type tensor<float>(x[128]) {
    indexing: attribute | index
    attribute {
        distance-metric: angular
    }
    index {
        hnsw {
            max-links-per-node: 16
            neighbors-to-explore-at-insert: 200
        }
    }
}
Attribute configuration patterns from Vespa schema implementation

Field Sets

Field sets group fields for convenient searching:
schema article {
    document article {
        field title type string {
            indexing: index | summary
        }
        
        field body type string {
            indexing: index | summary
        }
    }
    
    fieldset default {
        fields: title, body
    }
}
Example from msmarco.sd:57 Now queries automatically search both fields:
select * from article where default contains "vespa"

Tensor Fields

Tensor fields store multi-dimensional arrays, essential for machine learning:
field title_embedding type tensor<float>(x[768]) {
    indexing: attribute
}

field image_features type tensor<float>(x[512]) {
    indexing: attribute | index
    attribute {
        distance-metric: euclidean
    }
    index {
        hnsw {
            max-links-per-node: 16
            neighbors-to-explore-at-insert: 100
        }
    }
}
Based on tensor fields in msmarco.sd:27-49
Tensor fields can be indexed with HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search.

Document Summaries

Control which fields are returned in search results:
schema product {
    document product {
        field id type string {
            indexing: summary | attribute
        }
        
        field title type string {
            indexing: index | summary
        }
        
        field description type string {
            indexing: index | summary
        }
    }
    
    document-summary minimal {
        summary id type string {}
        summary title type string {}
    }
    
    document-summary full {
        summary id type string {}
        summary title type string {}
        summary description type string {}
    }
}
Document summary pattern from msmarco.sd:53

Schema Inheritance

Schemas can inherit from other schemas:
schema base_document {
    document base_document {
        field created_time type long {
            indexing: summary | attribute
        }
        
        field modified_time type long {
            indexing: summary | attribute
        }
    }
}

schema article inherits base_document {
    document article inherits base_document {
        field title type string {
            indexing: index | summary
        }
    }
}

Real-World Example

Here’s a complete schema for a search application:
msmarco.sd
schema msmarco {
    document msmarco {
        field id type string {
            indexing: summary | attribute
        }

        field title type string {
            indexing: index | summary
            index: enable-bm25
            stemming: best
        }

        field body type string {
            indexing: index | summary
            index: enable-bm25
            summary: dynamic
            stemming: best
        }

        field title_embedding type tensor<float>(x[768]) {
            indexing: attribute
        }

        field body_embedding type tensor<float>(x[768]) {
            indexing: attribute
        }
    }

    fieldset default {
        fields: title, body
    }

    rank-profile bm25 {
        first-phase {
            expression: bm25(title) + bm25(body)
        }
    }

    rank-profile semantic {
        function title_similarity() {
            expression: sum(query(query_embedding) * attribute(title_embedding))
        }
        
        first-phase {
            expression: title_similarity()
        }
    }
}
Simplified from msmarco.sd

Schema Processing Implementation

Schemas are processed by the config model:
1

Parse Schema File

Schema definition is parsed into an AST
2

Build Document Type

Document type structure is created from schema
3

Generate Configurations

Index, attribute, and rank configurations are generated
4

Deploy to Nodes

Configurations are distributed to content and container nodes
Key Modules:

Best Practices

Index Text Fields

Use index for fields you’ll search with text queries

Attribute Numeric Fields

Use attribute for numeric fields used in ranking or filtering

Enable BM25

Add index: enable-bm25 for better text ranking

Use Tensors for ML

Store embeddings as tensor fields for semantic search

Next Steps

Search

Learn how search works

Ranking

Configure ranking profiles

Tensors

Work with tensor fields

Build docs developers (and LLMs) love