Query enhancement

Query enhancement rewrites user queries to improve retrieval coverage. Instead of searching with a single query, you generate multiple query variations that capture different aspects of the information need. This addresses vocabulary mismatch between queries and documents.

Why query enhancement matters

A single query may miss relevant documents because:

Users phrase questions differently than document content
Important terms have synonyms or alternative phrasings
Specific queries miss broader context needed for complex answers

Query enhancement solves this by casting a wider retrieval net, then fusing results.

Techniques

Multi-query generation

Generates 3-5 alternative phrasings of the original query. Each variation may match different documents in your vector store.

from vectordb.haystack.query_enhancement.search import (
    PineconeQueryEnhancementSearchPipeline
)

pipeline = PineconeQueryEnhancementSearchPipeline(
    "configs/pinecone_triviaqa.yaml"
)

results = pipeline.run(
    query="What causes photosynthesis?",
    top_k=10
)

print(f"Found {len(results['documents'])} documents")
if "answer" in results:
    print(results["answer"])

When to use: Best for factual queries where different terminology might match different documents. Effective for domain-specific vocabulary.

HyDE (Hypothetical Document Embeddings)

Generates a hypothetical answer to the query, then searches for documents similar to that answer. This bridges the distribution gap between questions (short, interrogative) and documents (long, declarative).

from vectordb.haystack.components import QueryEnhancer

enhancer = QueryEnhancer(model="llama-3.3-70b-versatile")

# Generate hypothetical documents
hyde_docs = enhancer.generate_hypothetical_documents(
    "What is backpropagation?",
    num_docs=3
)

# Each hypothetical doc is embedded and used for retrieval
for doc in hyde_docs:
    print(doc[:100])

Output:

Backpropagation is the process by which neural networks learn...
In machine learning, backpropagation computes gradients...
The backpropagation algorithm adjusts weights to minimize error...

When to use: Best when queries are very short or when query/document distributions differ significantly (e.g., questions vs. encyclopedia articles).

Step-back prompting

Generates broader, more abstract questions that retrieve background context before answering the specific query.

from langchain_groq import ChatGroq
from vectordb.langchain.components import QueryEnhancer

llm = ChatGroq(model="llama-3.3-70b-versatile")
enhancer = QueryEnhancer(llm)

queries = enhancer.generate_queries(
    "What is backpropagation?",
    mode="step_back"
)

print(queries)

Output:

[
    "What is machine learning?",
    "How do neural networks learn?",
    "What is gradient descent?",
    "What is backpropagation?"  # Original query
]

When to use: Best for complex questions requiring background knowledge. The step-back questions retrieve context that improves answer quality.

Configuration

query_enhancement:
  type: multi_query  # or "hyde", "step_back"
  num_queries: 3
  num_hyde_docs: 3
  rrf_k: 60  # Reciprocal rank fusion parameter
  llm:
    model: llama-3.3-70b-versatile
    api_key: ${GROQ_API_KEY}

embeddings:
  model: sentence-transformers/all-MiniLM-L6-v2

pinecone:
  api_key: ${PINECONE_API_KEY}
  index_name: triviaqa
  namespace: default

rag:
  enabled: true
  model: llama-3.3-70b-versatile

How result fusion works

Each enhanced query retrieves top_k documents. Results are merged using Reciprocal Rank Fusion (RRF):

from vectordb.haystack.query_enhancement.utils.fusion import rrf_fusion_many

# Retrieve for each query variation
all_results = []
for query_variation in enhanced_queries:
    docs = vector_db.search(query_variation, top_k=10)
    all_results.append(docs)

# Fuse results using RRF
fused = rrf_fusion_many(all_results, k=60, top_k=10)

RRF scoring formula:

score(doc) = Σ 1 / (k + rank_i)

Where:

rank_i is the document’s rank in query result i
k controls the weight of lower-ranked results (default: 60)

Implementation details

Parallel execution

Haystack pipelines execute searches in parallel using ThreadPoolExecutor:

from concurrent.futures import ThreadPoolExecutor, as_completed

all_results = []
with ThreadPoolExecutor(max_workers=len(enhanced_queries)) as executor:
    futures = {
        executor.submit(self._search_single_query, q, top_k): q
        for q in enhanced_queries
    }
    
    for future in as_completed(futures):
        results = future.result()
        all_results.append(results)

Deduplication

After fusion, duplicate documents are removed based on content similarity:

from vectordb.haystack.query_enhancement.utils.fusion import (
    deduplicate_by_content
)

fused_results = rrf_fusion_many(all_results, k=60, top_k=50)
deduplicated = deduplicate_by_content(fused_results)
final_results = deduplicated[:top_k]

Cost considerations

Multi-query

LLM cost: 1 call to generate variations
Embedding cost: 3-5 query embeddings
Search cost: 3-5 vector searches
Total latency: ~2-3x single query (parallel execution)

HyDE

LLM cost: 1 call to generate hypothetical docs
Embedding cost: 3 document embeddings
Search cost: 3 vector searches
Trade-off: Higher embedding cost (documents longer than queries)

Step-back

LLM cost: 1 call to generate step-back questions
Embedding cost: 4 query embeddings
Search cost: 4 vector searches
Benefit: Retrieves broader context for complex questions

Best practices

Choose the right technique

Multi-query: General-purpose, works for most queries
HyDE: When queries are very short or domain-specific
Step-back: For complex questions needing background context

Tune fusion parameters

Lower rrf_k (30-40): Prioritizes top-ranked results
Higher rrf_k (80-100): Gives more weight to lower ranks
Default k=60 works well for most cases

Cache enhanced queries

For frequently-asked questions, cache the generated query variations to avoid repeated LLM calls.

Combine with reranking

Query enhancement increases recall (more relevant docs retrieved). Follow with reranking to improve precision.

Getting Started

Core Concepts

Vector Databases

Retrieval Features

Advanced RAG

Data Management

Query enhancement

Why query enhancement matters

Techniques

Multi-query generation

HyDE (Hypothetical Document Embeddings)

Step-back prompting

Configuration

How result fusion works

Implementation details

Parallel execution

Deduplication

Cost considerations

Best practices

Choose the right technique

Tune fusion parameters

Cache enhanced queries

Combine with reranking

See also

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Vector Databases

Retrieval Features

Advanced RAG

Data Management

​Why query enhancement matters

​Techniques

​Multi-query generation

​HyDE (Hypothetical Document Embeddings)

​Step-back prompting

​Configuration

​How result fusion works

​Implementation details

​Parallel execution

​Deduplication

​Cost considerations

​Best practices

Choose the right technique

Tune fusion parameters

Cache enhanced queries

Combine with reranking

​See also

Build docs developers (and LLMs) love

Why query enhancement matters

Techniques

Multi-query generation

HyDE (Hypothetical Document Embeddings)

Step-back prompting

Configuration

How result fusion works

Implementation details

Parallel execution

Deduplication

Cost considerations

Best practices

See also