Multi-Vector Search

Multi-vector search allows you to store multiple embedding vectors per document (as a matrix) and perform late-interaction searches using MaxSim scoring. This is ideal for token-level embeddings, ColBERT-style models, or any scenario where you need multiple vectors to represent a single document.

Overview

With multi-vector search, you can:

Store a matrix of embeddings per document (e.g., one per token)
Use MaxSim (maximum similarity) for late-interaction retrieval
Support various matrix value types (f32, f16, f8, u8, i8)
Optimize with quantization and sketch-based indexing
Control candidate selection for better speed/accuracy tradeoff

Schema Setup

Define a matrix field with a multiVectorIndex():

import { Client } from "topk-js";
import { matrix, multiVectorIndex, text } from "topk-js/schema";

const client = new Client({
  apiKey: "YOUR_API_KEY",
  region: "aws-us-east-1-elastica"
});

await client.collections().create("documents", {
  title: text(),
  token_embeddings: matrix({ 
    dimension: 128,  // Dimension of each vector
    valueType: "f32" 
  }).index(
    multiVectorIndex({ metric: "maxsim" })
  )
});

Matrix Value Types

TopK supports multiple matrix value types:

f32 - 32-bit floating point (standard precision)
f16 - 16-bit floating point (half precision)
f8 - 8-bit floating point
u8 - 8-bit unsigned integer
i8 - 8-bit signed integer

import { matrix, multiVectorIndex } from "topk-js/schema";

const schema = {
  // Standard precision
  embeddings_f32: matrix({ dimension: 128, valueType: "f32" })
    .index(multiVectorIndex({ metric: "maxsim" })),
  
  // Half precision - 50% storage savings
  embeddings_f16: matrix({ dimension: 128, valueType: "f16" })
    .index(multiVectorIndex({ metric: "maxsim" })),
  
  // 8-bit quantized - 75% storage savings
  embeddings_u8: matrix({ dimension: 128, valueType: "u8" })
    .index(multiVectorIndex({ metric: "maxsim", quantization: "scalar" }))
};

Index Options

Customize the multi-vector index behavior:

import { matrix, multiVectorIndex } from "topk-js/schema";

const schema = {
  token_embeddings: matrix({ dimension: 128, valueType: "f32" }).index(
    multiVectorIndex({
      metric: "maxsim",
      sketchBits: 128,  // Number of bits for sketching (optional)
      quantization: "1bit"  // 1bit, 2bit, or scalar (optional)
    })
  )
};

MaxSim is currently the only supported metric for multi-vector search. It computes the maximum similarity between each query vector and all document vectors, then sums these maximum similarities.

Inserting Documents with Matrices

Provide embeddings as a matrix (array of arrays):

import { matrix } from "topk-js/data";

await client.collection("documents").upsert([
  {
    _id: "doc1",
    title: "Machine Learning Basics",
    // Each row is a token embedding (e.g., 5 tokens, 128 dimensions each)
    token_embeddings: [
      [0.1, 0.2, 0.3, /* ... 128 dimensions */],
      [0.4, 0.5, 0.6, /* ... 128 dimensions */],
      [0.7, 0.8, 0.9, /* ... 128 dimensions */],
      [0.2, 0.3, 0.4, /* ... 128 dimensions */],
      [0.5, 0.6, 0.7, /* ... 128 dimensions */]
    ]
  },
  {
    _id: "doc2",
    title: "Deep Learning Guide",
    // Explicit matrix constructor for non-f32 types
    token_embeddings: matrix([
      [12, 24, 36, /* ... */],
      [48, 60, 72, /* ... */],
      [84, 96, 108, /* ... */]
    ], "u8")
  }
]);

Querying with Multi-Vector Search

Use fn.multiVectorDistance() to compute MaxSim scores:

import { select, field, fn } from "topk-js/query";

// Query with multiple token embeddings
const queryTokens = [
  [0.11, 0.22, 0.33, /* ... 128 dimensions */],
  [0.44, 0.55, 0.66, /* ... 128 dimensions */],
  [0.77, 0.88, 0.99, /* ... 128 dimensions */]
];

const results = await client.collection("documents").query(
  select({
    title: field("title"),
    score: fn.multiVectorDistance("token_embeddings", queryTokens)
  })
  .topk(field("score"), 10)
);

Controlling Candidates

Limit the number of candidate vectors considered for better performance:

import { select, field, fn } from "topk-js/query";

const results = await client.collection("documents").query(
  select({
    title: field("title"),
    score: fn.multiVectorDistance(
      "token_embeddings", 
      queryTokens,
      100  // Limit to 100 candidate vectors
    )
  })
  .topk(field("score"), 10)
);

Reducing the number of candidates can significantly improve query performance, especially for large documents with many token embeddings. Start with a higher value and tune down for optimal speed/accuracy balance.

Using Explicit Matrix Types

For non-f32 matrix types, use the explicit matrix constructor:

import { matrix } from "topk-js/data";
import { select, field, fn } from "topk-js/query";

// Query with u8 matrix
const queryMatrix = matrix([
  [12, 24, 36],
  [48, 60, 72],
  [84, 96, 108]
], "u8");

const results = await client.collection("documents").query(
  select({
    score: fn.multiVectorDistance("token_embeddings", queryMatrix)
  })
  .topk(field("score"), 10)
);

Combining with Filters

Apply filters before multi-vector search:

import { select, filter, field, fn } from "topk-js/query";

const results = await client.collection("documents").query(
  select({
    title: field("title"),
    score: fn.multiVectorDistance("token_embeddings", queryTokens)
  })
  .filter(field("category").eq("machine-learning"))
  .topk(field("score"), 10)
);

Use Cases

Multi-vector search is ideal for:

Token-level embeddings: Store embeddings for each token in a document
ColBERT-style models: Late interaction models that benefit from MaxSim
Multi-representation documents: Documents with multiple semantic aspects
Fine-grained matching: Match specific parts of documents rather than whole-document embeddings

The matrix dimension parameter specifies the length of each individual vector (number of columns), not the number of vectors. The number of vectors (rows) can vary per document.

Ensure your query matrix has the same dimension (number of columns) as the indexed field, but the number of rows can differ.

Vector Search - Single vector per document
True Hybrid Search - Combine with other search types
Reranking - Improve multi-vector search results

Get Started

Core Concepts

Collections

Documents

Advanced

Multi-Vector Search

Overview

Schema Setup

Matrix Value Types

Index Options

Inserting Documents with Matrices

Querying with Multi-Vector Search

Controlling Candidates

Using Explicit Matrix Types

Combining with Filters

Use Cases

Build docs developers (and LLMs) love

Get Started

Core Concepts

Collections

Documents

Advanced

​Overview

​Schema Setup

​Matrix Value Types

​Index Options

​Inserting Documents with Matrices

​Querying with Multi-Vector Search

​Controlling Candidates

​Using Explicit Matrix Types

​Combining with Filters

​Use Cases

​Related Concepts

Build docs developers (and LLMs) love

Overview

Schema Setup

Matrix Value Types

Index Options

Inserting Documents with Matrices

Querying with Multi-Vector Search

Controlling Candidates

Using Explicit Matrix Types

Combining with Filters

Use Cases

Related Concepts