True hybrid search allows you to combine multiple search techniques—semantic search, vector search, keyword search, sparse vectors, and multi-vector search—in a single query. This enables you to leverage the strengths of each approach for superior search results.
Overview
With true hybrid search, you can:
- Combine semantic similarity with keyword matching
- Blend vector search with BM25 scoring
- Weight different search signals based on importance
- Apply filters across all search types
- Dynamically boost results based on conditions
Basic Hybrid Search
Combine semantic search with keyword search:
import { Client } from "topk-js";
import { text, semanticIndex, keywordIndex } from "topk-js/schema";
import { select, filter, field, fn, match } from "topk-js/query";
const client = new Client({
apiKey: "YOUR_API_KEY",
region: "aws-us-east-1-elastica"
});
// Create collection with both semantic and keyword indexes
await client.collections().create("books", {
title: text().index(semanticIndex()).index(keywordIndex()),
summary: text().index(keywordIndex())
});
// Hybrid query: semantic similarity + keyword matching
const results = await client.collection("books").query(
select({
title: field("title"),
semantic_score: fn.semanticSimilarity("title", "classic literature"),
bm25_score: fn.bm25Score()
})
.filter(match("american", { field: "summary" }))
.topk(
field("semantic_score").mul(0.7).add(field("bm25_score").mul(0.3)),
10
)
);
Experiment with different weight combinations to find the optimal balance for your use case. Start with equal weights (0.5/0.5) and adjust based on result quality.
Vector + Keyword Hybrid
Combine custom vector embeddings with BM25 scoring:
import { f32Vector, vectorIndex, text, keywordIndex } from "topk-js/schema";
import { select, filter, field, fn, match } from "topk-js/query";
// Schema with vector and keyword indexes
await client.collections().create("documents", {
title: text(),
content: text().index(keywordIndex()),
embedding: f32Vector({ dimension: 1536 }).index(
vectorIndex({ metric: "cosine" })
)
});
// Query combining vector similarity and keyword relevance
const queryEmbedding = await generateEmbedding("machine learning");
const results = await client.collection("documents").query(
select({
title: field("title"),
vector_score: fn.vectorDistance("embedding", queryEmbedding),
keyword_score: fn.bm25Score()
})
.filter(
match("machine", { field: "content", weight: 30.0 }) |
match("learning", { field: "content", weight: 20.0 })
)
.topk(
// Invert vector distance and combine with BM25
field("vector_score").mul(-100).add(field("keyword_score")),
20
)
);
Multiple Field Hybrid Search
Search across multiple fields with different techniques:
import { select, filter, field, fn, match } from "topk-js/query";
const results = await client.collection("books").query(
select({
title: field("title"),
title_semantic: fn.semanticSimilarity("title", "dystopian novel"),
summary_semantic: fn.semanticSimilarity("summary", "totalitarian society"),
keyword_score: fn.bm25Score()
})
.filter(
match("dystopian", { field: "title" }) |
match("totalitarian", { field: "summary" })
)
.topk(
field("title_semantic")
.add(field("summary_semantic"))
.add(field("keyword_score").mul(0.5)),
10
)
);
Sparse + Dense Hybrid
Combine sparse and dense vector search:
import { f32Vector, f32SparseVector, vectorIndex } from "topk-js/schema";
import { select, field, fn } from "topk-js/query";
import { f32SparseVector as sparseVec } from "topk-js/data";
// Schema with both dense and sparse vectors
await client.collections().create("documents", {
dense_embedding: f32Vector({ dimension: 768 }).index(
vectorIndex({ metric: "cosine" })
),
sparse_embedding: f32SparseVector().index(
vectorIndex({ metric: "dot_product" })
)
});
// Hybrid query with both vector types
const denseQuery = await generateDenseEmbedding("machine learning");
const sparseQuery = await generateSparseEmbedding("machine learning");
const results = await client.collection("documents").query(
select({
dense_score: fn.vectorDistance("dense_embedding", denseQuery),
sparse_score: fn.vectorDistance("sparse_embedding", sparseVec(sparseQuery))
})
.topk(
field("dense_score").mul(-50).add(field("sparse_score")),
10
)
);
Conditional Boosting
Dynamically boost results based on metadata or keyword matches:
import { select, field, fn } from "topk-js/query";
const results = await client.collection("books").query(
select({
title: field("title"),
similarity: fn.vectorDistance("embedding", queryEmbedding)
})
.topk(
// Boost by 10x if summary matches "racial injustice"
field("similarity").boost(
field("summary").matchAll("racial injustice"),
0.1 // multiply by 0.1 (distance is inverted)
),
10,
true
)
);
// Equivalent using choose()
const results2 = await client.collection("books").query(
select({
title: field("title"),
similarity: fn.vectorDistance("embedding", queryEmbedding)
})
.topk(
field("similarity").mul(
field("summary").matchAll("racial injustice").choose(0.1, 1.0)
),
10,
true
)
);
The boost() method is a convenience wrapper around choose(). Use it to multiply a score by a boost factor when a condition is true.
Handling Null Values
Use coalesce() to provide default values for null fields:
import { select, field, fn } from "topk-js/query";
const results = await client.collection("books").query(
select({
title: field("title"),
main_score: fn.vectorDistance("embedding", queryEmbedding),
backup_score: fn.vectorDistance("backup_embedding", queryEmbedding)
})
.topk(
field("main_score").add(
// Use backup_score only if it's not null, otherwise add 0
field("backup_score").coalesce(0.0)
),
10,
true
)
);
Multi-Stage Retrieval
Combine hybrid search with reranking:
import { select, filter, field, fn, match } from "topk-js/query";
const results = await client.collection("books").query(
select({
title: field("title"),
semantic_score: fn.semanticSimilarity("title", "dystopian fiction"),
keyword_score: fn.bm25Score()
})
.filter(match("dystopian", { field: "summary" }))
.topk(
field("semantic_score").add(field("keyword_score").mul(0.3)),
50 // Retrieve 50 candidates
)
.rerank({
model: "cohere/rerank-v4",
query: "best dystopian novels with social commentary",
fields: ["title", "summary"],
topkMultiple: 2 // Consider 100 documents (50 * 2)
})
.limit(10); // Return top 10 after reranking
Best Practices
Score Normalization
Different search techniques produce scores on different scales:
- Vector distances: Usually small values (0.0 to 2.0)
- BM25 scores: Can be larger values (0 to 100+)
- Semantic similarity: Model-dependent scale
Normalize or weight scores appropriately:
// Multiply vector distance by 100 to match BM25 scale
field("vector_distance").mul(-100).add(field("bm25_score"))
// Or weight them differently
field("vector_distance").mul(-50).add(field("bm25_score").mul(0.5))
Filter Before Scoring
Apply filters early to reduce the search space:
// Good: Filter first
const results = await client.collection("books").query(
select({ score: fn.vectorDistance("embedding", query) })
.filter(field("published_year").gte(2000)) // Filter first
.topk(field("score"), 10, true)
);
Experiment with Weights
Start with these baseline weight combinations:
- Semantic-heavy: 0.7 semantic + 0.3 keyword
- Balanced: 0.5 semantic + 0.5 keyword
- Keyword-heavy: 0.3 semantic + 0.7 keyword
Adjust based on your evaluation metrics.
Use offline evaluation with ground truth data to find optimal weight combinations for your specific use case.
When combining distance metrics with scores, remember that lower distances mean more similar (inverse relationship). Multiply by -1 or use appropriate scaling when combining with similarity scores.