True Hybrid Search

True hybrid search allows you to combine multiple search techniques—semantic search, vector search, keyword search, sparse vectors, and multi-vector search—in a single query. This enables you to leverage the strengths of each approach for superior search results.

Overview

With true hybrid search, you can:

Combine semantic similarity with keyword matching
Blend vector search with BM25 scoring
Weight different search signals based on importance
Apply filters across all search types
Dynamically boost results based on conditions

Basic Hybrid Search

Combine semantic search with keyword search:

import { Client } from "topk-js";
import { text, semanticIndex, keywordIndex } from "topk-js/schema";
import { select, filter, field, fn, match } from "topk-js/query";

const client = new Client({
  apiKey: "YOUR_API_KEY",
  region: "aws-us-east-1-elastica"
});

// Create collection with both semantic and keyword indexes
await client.collections().create("books", {
  title: text().index(semanticIndex()).index(keywordIndex()),
  summary: text().index(keywordIndex())
});

// Hybrid query: semantic similarity + keyword matching
const results = await client.collection("books").query(
  select({
    title: field("title"),
    semantic_score: fn.semanticSimilarity("title", "classic literature"),
    bm25_score: fn.bm25Score()
  })
  .filter(match("american", { field: "summary" }))
  .topk(
    field("semantic_score").mul(0.7).add(field("bm25_score").mul(0.3)),
    10
  )
);

Experiment with different weight combinations to find the optimal balance for your use case. Start with equal weights (0.5/0.5) and adjust based on result quality.

Vector + Keyword Hybrid

Combine custom vector embeddings with BM25 scoring:

import { f32Vector, vectorIndex, text, keywordIndex } from "topk-js/schema";
import { select, filter, field, fn, match } from "topk-js/query";

// Schema with vector and keyword indexes
await client.collections().create("documents", {
  title: text(),
  content: text().index(keywordIndex()),
  embedding: f32Vector({ dimension: 1536 }).index(
    vectorIndex({ metric: "cosine" })
  )
});

// Query combining vector similarity and keyword relevance
const queryEmbedding = await generateEmbedding("machine learning");

const results = await client.collection("documents").query(
  select({
    title: field("title"),
    vector_score: fn.vectorDistance("embedding", queryEmbedding),
    keyword_score: fn.bm25Score()
  })
  .filter(
    match("machine", { field: "content", weight: 30.0 }) |
    match("learning", { field: "content", weight: 20.0 })
  )
  .topk(
    // Invert vector distance and combine with BM25
    field("vector_score").mul(-100).add(field("keyword_score")),
    20
  )
);

Multiple Field Hybrid Search

Search across multiple fields with different techniques:

import { select, filter, field, fn, match } from "topk-js/query";

const results = await client.collection("books").query(
  select({
    title: field("title"),
    title_semantic: fn.semanticSimilarity("title", "dystopian novel"),
    summary_semantic: fn.semanticSimilarity("summary", "totalitarian society"),
    keyword_score: fn.bm25Score()
  })
  .filter(
    match("dystopian", { field: "title" }) |
    match("totalitarian", { field: "summary" })
  )
  .topk(
    field("title_semantic")
      .add(field("summary_semantic"))
      .add(field("keyword_score").mul(0.5)),
    10
  )
);

Sparse + Dense Hybrid

Combine sparse and dense vector search:

import { f32Vector, f32SparseVector, vectorIndex } from "topk-js/schema";
import { select, field, fn } from "topk-js/query";
import { f32SparseVector as sparseVec } from "topk-js/data";

// Schema with both dense and sparse vectors
await client.collections().create("documents", {
  dense_embedding: f32Vector({ dimension: 768 }).index(
    vectorIndex({ metric: "cosine" })
  ),
  sparse_embedding: f32SparseVector().index(
    vectorIndex({ metric: "dot_product" })
  )
});

// Hybrid query with both vector types
const denseQuery = await generateDenseEmbedding("machine learning");
const sparseQuery = await generateSparseEmbedding("machine learning");

const results = await client.collection("documents").query(
  select({
    dense_score: fn.vectorDistance("dense_embedding", denseQuery),
    sparse_score: fn.vectorDistance("sparse_embedding", sparseVec(sparseQuery))
  })
  .topk(
    field("dense_score").mul(-50).add(field("sparse_score")),
    10
  )
);

Conditional Boosting

Dynamically boost results based on metadata or keyword matches:

import { select, field, fn } from "topk-js/query";

const results = await client.collection("books").query(
  select({
    title: field("title"),
    similarity: fn.vectorDistance("embedding", queryEmbedding)
  })
  .topk(
    // Boost by 10x if summary matches "racial injustice"
    field("similarity").boost(
      field("summary").matchAll("racial injustice"),
      0.1  // multiply by 0.1 (distance is inverted)
    ),
    10,
    true
  )
);

// Equivalent using choose()
const results2 = await client.collection("books").query(
  select({
    title: field("title"),
    similarity: fn.vectorDistance("embedding", queryEmbedding)
  })
  .topk(
    field("similarity").mul(
      field("summary").matchAll("racial injustice").choose(0.1, 1.0)
    ),
    10,
    true
  )
);

The boost() method is a convenience wrapper around choose(). Use it to multiply a score by a boost factor when a condition is true.

Handling Null Values

Use coalesce() to provide default values for null fields:

import { select, field, fn } from "topk-js/query";

const results = await client.collection("books").query(
  select({
    title: field("title"),
    main_score: fn.vectorDistance("embedding", queryEmbedding),
    backup_score: fn.vectorDistance("backup_embedding", queryEmbedding)
  })
  .topk(
    field("main_score").add(
      // Use backup_score only if it's not null, otherwise add 0
      field("backup_score").coalesce(0.0)
    ),
    10,
    true
  )
);

Multi-Stage Retrieval

Combine hybrid search with reranking:

import { select, filter, field, fn, match } from "topk-js/query";

const results = await client.collection("books").query(
  select({
    title: field("title"),
    semantic_score: fn.semanticSimilarity("title", "dystopian fiction"),
    keyword_score: fn.bm25Score()
  })
  .filter(match("dystopian", { field: "summary" }))
  .topk(
    field("semantic_score").add(field("keyword_score").mul(0.3)),
    50  // Retrieve 50 candidates
  )
  .rerank({
    model: "cohere/rerank-v4",
    query: "best dystopian novels with social commentary",
    fields: ["title", "summary"],
    topkMultiple: 2  // Consider 100 documents (50 * 2)
  })
  .limit(10);  // Return top 10 after reranking

Best Practices

Score Normalization

Different search techniques produce scores on different scales:

Vector distances: Usually small values (0.0 to 2.0)
BM25 scores: Can be larger values (0 to 100+)
Semantic similarity: Model-dependent scale

Normalize or weight scores appropriately:

// Multiply vector distance by 100 to match BM25 scale
field("vector_distance").mul(-100).add(field("bm25_score"))

// Or weight them differently
field("vector_distance").mul(-50).add(field("bm25_score").mul(0.5))

Filter Before Scoring

Apply filters early to reduce the search space:

// Good: Filter first
const results = await client.collection("books").query(
  select({ score: fn.vectorDistance("embedding", query) })
    .filter(field("published_year").gte(2000))  // Filter first
    .topk(field("score"), 10, true)
);

Experiment with Weights

Start with these baseline weight combinations:

Semantic-heavy: 0.7 semantic + 0.3 keyword
Balanced: 0.5 semantic + 0.5 keyword
Keyword-heavy: 0.3 semantic + 0.7 keyword

Adjust based on your evaluation metrics.

Use offline evaluation with ground truth data to find optimal weight combinations for your specific use case.

When combining distance metrics with scores, remember that lower distances mean more similar (inverse relationship). Multiply by -1 or use appropriate scaling when combining with similarity scores.

Semantic Search - Natural language search
Vector Search - Custom embedding search
Keyword Search - Term-based search
Reranking - Improve hybrid results

Get Started

Core Concepts

Collections

Documents

Advanced

True Hybrid Search

Overview

Basic Hybrid Search

Vector + Keyword Hybrid

Multiple Field Hybrid Search

Sparse + Dense Hybrid

Conditional Boosting

Handling Null Values

Multi-Stage Retrieval

Best Practices

Score Normalization

Filter Before Scoring

Experiment with Weights

Build docs developers (and LLMs) love

Get Started

Core Concepts

Collections

Documents

Advanced

​Overview

​Basic Hybrid Search

​Vector + Keyword Hybrid

​Multiple Field Hybrid Search

​Sparse + Dense Hybrid

​Conditional Boosting

​Handling Null Values

​Multi-Stage Retrieval

​Best Practices

​Score Normalization

​Filter Before Scoring

​Experiment with Weights

​Related Concepts

Build docs developers (and LLMs) love

Overview

Basic Hybrid Search

Vector + Keyword Hybrid

Multiple Field Hybrid Search

Sparse + Dense Hybrid

Conditional Boosting

Handling Null Values

Multi-Stage Retrieval

Best Practices

Score Normalization

Filter Before Scoring

Experiment with Weights

Related Concepts