Tensors - Vespa

Tensors are Vespa’s native support for multi-dimensional arrays. They’re essential for machine learning, semantic search, and advanced ranking operations.

What are Tensors?

A tensor is a multi-dimensional array that can be used in computations. In Vespa:

Tensors have named dimensions
Each dimension can be sparse (mapped) or dense (indexed)
Cells contain scalar values (float, double, int8, bfloat16)
Operations are optimized for machine learning workloads

Vespa’s tensor implementation is available in both Java (vespajlib) and C++ (eval) for use throughout the system.

Tensor Implementation

Here’s the core Tensor interface from the Java implementation:

package com.yahoo.tensor;

/**
 * A multidimensional array which can be used in computations.
 * 
 * A tensor consists of a set of dimension names and a set of cells 
 * containing scalar values. Each cell is identified by its address, 
 * which consists of a set of dimension-label pairs.
 */
public interface Tensor {
    /** Returns the type of this tensor */
    TensorType type();
    
    /** Returns whether this has any cells */
    default boolean isEmpty() { return size() == 0; }
    
    /** Returns the number of cells in this tensor */
    default long size() {
        return sizeAsInt();
    }
    
    /** Returns the value of a cell, or 0.0 if this cell does not exist */
    double get(TensorAddress address);
    
    /** Returns true if this cell exists */
    boolean has(TensorAddress address);
    
    /** Returns the cells of this in some undefined order */
    Iterator<Cell> cellIterator();
}

From vespajlib/src/main/java/com/yahoo/tensor/Tensor.java:54

Tensor Types

Dense Tensors (Indexed)

Dense tensors have dimensions with integer indices:

tensor<float>(x[768])

Dense tensors are stored contiguously in memory - efficient for embeddings and neural network layers.

Sparse Tensors (Mapped)

Sparse tensors have dimensions with string labels:

tensor<float>(user{}, item{})

Sparse tensors only store non-zero cells - efficient for very large, sparse data.

Mixed Tensors

Mixed tensors combine indexed and mapped dimensions:

tensor<float>(user{}, feature[100])

Useful for per-user embeddings or similar use cases.

Tensor Cell Types

Control memory usage and precision:

float

32-bit floating point (default)

double

64-bit floating point

bfloat16

16-bit brain float (ML optimized)

int8

8-bit integer (quantized)

Example with different cell types:

tensor<bfloat16>(x[768])  // Half the memory of float
tensor<int8>(x[768])      // 1/4 the memory of float

Using Tensors in Schemas

Embedding Fields

Store vector embeddings as tensor fields:

schema article {
    document article {
        field title type string {
            indexing: index | summary
        }
        
        field title_embedding type tensor<float>(x[768]) {
            indexing: attribute
        }
        
        field body_embedding type tensor<float>(x[768]) {
            indexing: attribute | index
            attribute {
                distance-metric: angular
            }
            index {
                hnsw {
                    max-links-per-node: 16
                    neighbors-to-explore-at-insert: 200
                }
            }
        }
    }
}

Based on tensor fields in msmarco.sd:27-49

Distance Metrics

When indexing tensors for nearest neighbor search:

angular

Cosine distance (angular separation):

attribute {
    distance-metric: angular
}

Best for normalized embeddings.

euclidean

L2 distance:

attribute {
    distance-metric: euclidean
}

Standard Euclidean distance.

dotproduct

Negative dot product:

attribute {
    distance-metric: dotproduct
}

For maximum inner product search.

hamming

Hamming distance:

attribute {
    distance-metric: hamming
}

For binary vectors.

Tensor Operations

Vespa provides rich tensor operations available from the Java Tensor class:

import com.yahoo.tensor.functions.*;

public interface Tensor {
    // Arithmetic operations
    Tensor map(DoubleUnaryOperator mapper);
    Tensor join(Tensor other, DoubleBinaryOperator combiner);
    Tensor reduce(Reduce.Aggregator aggregator, String... dimensions);
    
    // Linear algebra
    Tensor matmul(Tensor other, String dimension);
    
    // Tensor manipulation
    Tensor rename(String fromDimension, String toDimension);
    Tensor concat(Tensor other, String dimension);
}

From Tensor.java:1-23

Common Operations

Element-wise Operations

Apply operations to each cell:

tensor1 + tensor2           // Addition
tensor1 * tensor2           // Multiplication
tensor1 - tensor2           // Subtraction
tensor1 / tensor2           // Division
pow(tensor1, 2)            // Power
exp(tensor1)               // Exponential

Reduction Operations

Aggregate across dimensions:

sum(tensor, dim)           // Sum
max(tensor, dim)           // Maximum
min(tensor, dim)           // Minimum
avg(tensor, dim)           // Average
count(tensor, dim)         // Count

Matrix Operations

Linear algebra:

tensor1 * tensor2          // Dot product (when appropriate)
matmul(tensor1, tensor2, dim)  // Matrix multiplication

Tensor Functions

Advanced operations:

concat(tensor1, tensor2, dim)  // Concatenation
rename(tensor, from, to)       // Rename dimension
expand(tensor, dim)            // Add dimension

Tensors in Ranking

Semantic Search Example

Compute similarity between query and document embeddings:

rank-profile semantic {
    function similarity() {
        expression: sum(query(query_embedding) * attribute(doc_embedding))
    }
    
    first-phase {
        expression: similarity()
    }
}

From msmarco.sd:73-88 The * operator performs element-wise multiplication, and sum() reduces to a scalar score.

Multi-Field Embeddings

Combine embeddings from multiple fields:

rank-profile multi_field_semantic {
    function title_similarity() {
        expression: sum(query(query_embedding) * attribute(title_embedding))
    }
    
    function body_similarity() {
        expression: sum(query(query_embedding) * attribute(body_embedding))
    }
    
    first-phase {
        expression: 2.0 * title_similarity() + body_similarity()
    }
}

From msmarco.sd:73-88

Tensor Literal Syntax

Create tensors directly in expressions:

tensor<float>(x[3]):[1.0, 2.0, 3.0]

Tensor Use Cases

1. Semantic Search

Store and query document embeddings:

field embedding type tensor<float>(x[768]) {
    indexing: attribute | index
    attribute {
        distance-metric: angular
    }
    index {
        hnsw {
            max-links-per-node: 16
            neighbors-to-explore-at-insert: 200
        }
    }
}

Query:

select * from documents 
where {targetHits:10}nearestNeighbor(embedding, query_embedding)

2. Neural Network Inference

Store model weights as tensors:

function neural_net() {
    expression {
        xw(matmul(attribute(input_features), constant(weights_layer1)),
           constant(bias_layer1),
           "relu")
    }
}

3. Personalized Ranking

Per-user feature vectors:

field user_preferences type tensor<float>(category[50]) {
    indexing: attribute
}

field product_features type tensor<float>(category[50]) {
    indexing: attribute
}

rank-profile personalized {
    first-phase {
        expression: sum(query(user_prefs) * attribute(product_features))
    }
}

4. Collaborative Filtering

Sparse user-item matrices:

field interactions type tensor<float>(user{}, item{}) {
    indexing: attribute
}

Tensor Performance

Optimization Tips

Choose Right Type

Use dense for embeddings, sparse for categorical data

Use Appropriate Precision

Consider bfloat16 or int8 for large tensors

Index for ANN

Add HNSW index for nearest neighbor queries

Minimize Dimension

Smaller embeddings = faster computations

Memory Usage

Tensor memory depends on cell type and dimensions:

tensor<float>(x[768])    = 768 * 4 bytes  = 3 KB
tensor<bfloat16>(x[768]) = 768 * 2 bytes  = 1.5 KB
tensor<int8>(x[768])     = 768 * 1 byte   = 768 bytes

For 1M documents:

float: 3 GB
bfloat16: 1.5 GB
int8: 768 MB

Feeding Tensor Data

Send tensor values when feeding documents:

{
  "put": "id:article:article::123",
  "fields": {
    "title": "Understanding Vespa Tensors",
    "embedding": {
      "values": [0.12, -0.45, 0.78, ...]
    }
  }
}

Tensor Evaluation Engine

The evaluation engine optimizes tensor operations:

Module: eval
Compiles tensor expressions into efficient code
Supports multiple backends (CPU, GPU)
Automatic optimization based on tensor types

The eval module provides efficient evaluation of ranking expressions and tensor operations on content nodes.

Advanced Tensor Operations

Matrix Multiplication

function matrix_mult() {
    expression: sum(attribute(matrix1) * attribute(matrix2), common_dimension)
}

Normalization

function l2_normalize() {
    expression: attribute(vector) / sqrt(sum(pow(attribute(vector), 2)))
}

Softmax

function softmax() {
    expression {
        exp(attribute(logits)) / sum(exp(attribute(logits)))
    }
}

Cosine Similarity

function cosine_similarity() {
    expression {
        sum(query(q) * attribute(d)) / 
        (sqrt(sum(pow(query(q), 2))) * sqrt(sum(pow(attribute(d), 2))))
    }
}

Best Practices

Start Simple

Begin with basic embeddings, add complexity as needed

Profile Memory

Monitor tensor memory usage in production

Test Precision

Validate that bfloat16/int8 maintains quality

Index Large Tensors

Use HNSW for tensors with >100 dimensions

Next Steps

Ranking

Use tensors in ranking expressions

Schemas

Define tensor fields

Search

Query tensor fields

Get Started

Core Concepts

Search & Query

Data Operations

Machine Learning

Configuration & Deployment

Performance & Operations

​What are Tensors?

​Tensor Implementation

​Tensor Types

​Dense Tensors (Indexed)

​Sparse Tensors (Mapped)

​Mixed Tensors

​Tensor Cell Types

float

double

bfloat16

int8

​Using Tensors in Schemas

​Embedding Fields

​Distance Metrics

​Tensor Operations

​Common Operations

​Tensors in Ranking

​Semantic Search Example

​Multi-Field Embeddings

​Tensor Literal Syntax

​Tensor Use Cases

​1. Semantic Search

​2. Neural Network Inference

​3. Personalized Ranking

​4. Collaborative Filtering

​Tensor Performance

​Optimization Tips

Choose Right Type

Use Appropriate Precision

Index for ANN

Minimize Dimension

​Memory Usage

​Feeding Tensor Data

​Tensor Evaluation Engine

​Advanced Tensor Operations

​Matrix Multiplication

​Normalization

​Softmax

​Cosine Similarity

​Best Practices

Start Simple

Profile Memory

Test Precision

Index Large Tensors

​Next Steps

Ranking

Schemas

Search

Build docs developers (and LLMs) love

What are Tensors?

Tensor Implementation

Tensor Types

Dense Tensors (Indexed)

Sparse Tensors (Mapped)

Mixed Tensors

Tensor Cell Types

Using Tensors in Schemas

Embedding Fields

Distance Metrics

Tensor Operations

Common Operations

Tensors in Ranking

Semantic Search Example

Multi-Field Embeddings

Tensor Literal Syntax

Tensor Use Cases

1. Semantic Search

2. Neural Network Inference

3. Personalized Ranking

4. Collaborative Filtering

Tensor Performance

Optimization Tips

Memory Usage

Feeding Tensor Data

Tensor Evaluation Engine

Advanced Tensor Operations

Matrix Multiplication

Normalization

Softmax

Cosine Similarity

Best Practices

Next Steps