Skip to main content
Multi-vector search allows you to store multiple embedding vectors per document (as a matrix) and perform late-interaction searches using MaxSim scoring. This is ideal for token-level embeddings, ColBERT-style models, or any scenario where you need multiple vectors to represent a single document.

Overview

With multi-vector search, you can:
  • Store a matrix of embeddings per document (e.g., one per token)
  • Use MaxSim (maximum similarity) for late-interaction retrieval
  • Support various matrix value types (f32, f16, f8, u8, i8)
  • Optimize with quantization and sketch-based indexing
  • Control candidate selection for better speed/accuracy tradeoff

Schema Setup

Define a matrix field with a multiVectorIndex():
import { Client } from "topk-js";
import { matrix, multiVectorIndex, text } from "topk-js/schema";

const client = new Client({
  apiKey: "YOUR_API_KEY",
  region: "aws-us-east-1-elastica"
});

await client.collections().create("documents", {
  title: text(),
  token_embeddings: matrix({ 
    dimension: 128,  // Dimension of each vector
    valueType: "f32" 
  }).index(
    multiVectorIndex({ metric: "maxsim" })
  )
});

Matrix Value Types

TopK supports multiple matrix value types:
  • f32 - 32-bit floating point (standard precision)
  • f16 - 16-bit floating point (half precision)
  • f8 - 8-bit floating point
  • u8 - 8-bit unsigned integer
  • i8 - 8-bit signed integer
import { matrix, multiVectorIndex } from "topk-js/schema";

const schema = {
  // Standard precision
  embeddings_f32: matrix({ dimension: 128, valueType: "f32" })
    .index(multiVectorIndex({ metric: "maxsim" })),
  
  // Half precision - 50% storage savings
  embeddings_f16: matrix({ dimension: 128, valueType: "f16" })
    .index(multiVectorIndex({ metric: "maxsim" })),
  
  // 8-bit quantized - 75% storage savings
  embeddings_u8: matrix({ dimension: 128, valueType: "u8" })
    .index(multiVectorIndex({ metric: "maxsim", quantization: "scalar" }))
};

Index Options

Customize the multi-vector index behavior:
import { matrix, multiVectorIndex } from "topk-js/schema";

const schema = {
  token_embeddings: matrix({ dimension: 128, valueType: "f32" }).index(
    multiVectorIndex({
      metric: "maxsim",
      sketchBits: 128,  // Number of bits for sketching (optional)
      quantization: "1bit"  // 1bit, 2bit, or scalar (optional)
    })
  )
};
MaxSim is currently the only supported metric for multi-vector search. It computes the maximum similarity between each query vector and all document vectors, then sums these maximum similarities.

Inserting Documents with Matrices

Provide embeddings as a matrix (array of arrays):
import { matrix } from "topk-js/data";

await client.collection("documents").upsert([
  {
    _id: "doc1",
    title: "Machine Learning Basics",
    // Each row is a token embedding (e.g., 5 tokens, 128 dimensions each)
    token_embeddings: [
      [0.1, 0.2, 0.3, /* ... 128 dimensions */],
      [0.4, 0.5, 0.6, /* ... 128 dimensions */],
      [0.7, 0.8, 0.9, /* ... 128 dimensions */],
      [0.2, 0.3, 0.4, /* ... 128 dimensions */],
      [0.5, 0.6, 0.7, /* ... 128 dimensions */]
    ]
  },
  {
    _id: "doc2",
    title: "Deep Learning Guide",
    // Explicit matrix constructor for non-f32 types
    token_embeddings: matrix([
      [12, 24, 36, /* ... */],
      [48, 60, 72, /* ... */],
      [84, 96, 108, /* ... */]
    ], "u8")
  }
]);
Use fn.multiVectorDistance() to compute MaxSim scores:
import { select, field, fn } from "topk-js/query";

// Query with multiple token embeddings
const queryTokens = [
  [0.11, 0.22, 0.33, /* ... 128 dimensions */],
  [0.44, 0.55, 0.66, /* ... 128 dimensions */],
  [0.77, 0.88, 0.99, /* ... 128 dimensions */]
];

const results = await client.collection("documents").query(
  select({
    title: field("title"),
    score: fn.multiVectorDistance("token_embeddings", queryTokens)
  })
  .topk(field("score"), 10)
);

Controlling Candidates

Limit the number of candidate vectors considered for better performance:
import { select, field, fn } from "topk-js/query";

const results = await client.collection("documents").query(
  select({
    title: field("title"),
    score: fn.multiVectorDistance(
      "token_embeddings", 
      queryTokens,
      100  // Limit to 100 candidate vectors
    )
  })
  .topk(field("score"), 10)
);
Reducing the number of candidates can significantly improve query performance, especially for large documents with many token embeddings. Start with a higher value and tune down for optimal speed/accuracy balance.

Using Explicit Matrix Types

For non-f32 matrix types, use the explicit matrix constructor:
import { matrix } from "topk-js/data";
import { select, field, fn } from "topk-js/query";

// Query with u8 matrix
const queryMatrix = matrix([
  [12, 24, 36],
  [48, 60, 72],
  [84, 96, 108]
], "u8");

const results = await client.collection("documents").query(
  select({
    score: fn.multiVectorDistance("token_embeddings", queryMatrix)
  })
  .topk(field("score"), 10)
);

Combining with Filters

Apply filters before multi-vector search:
import { select, filter, field, fn } from "topk-js/query";

const results = await client.collection("documents").query(
  select({
    title: field("title"),
    score: fn.multiVectorDistance("token_embeddings", queryTokens)
  })
  .filter(field("category").eq("machine-learning"))
  .topk(field("score"), 10)
);

Use Cases

Multi-vector search is ideal for:
  • Token-level embeddings: Store embeddings for each token in a document
  • ColBERT-style models: Late interaction models that benefit from MaxSim
  • Multi-representation documents: Documents with multiple semantic aspects
  • Fine-grained matching: Match specific parts of documents rather than whole-document embeddings
The matrix dimension parameter specifies the length of each individual vector (number of columns), not the number of vectors. The number of vectors (rows) can vary per document.
Ensure your query matrix has the same dimension (number of columns) as the indexed field, but the number of rows can differ.

Build docs developers (and LLMs) love