Skip to main content
Query enhancement rewrites user queries to improve retrieval coverage. Instead of searching with a single query, you generate multiple query variations that capture different aspects of the information need. This addresses vocabulary mismatch between queries and documents.

Why query enhancement matters

A single query may miss relevant documents because:
  • Users phrase questions differently than document content
  • Important terms have synonyms or alternative phrasings
  • Specific queries miss broader context needed for complex answers
Query enhancement solves this by casting a wider retrieval net, then fusing results.

Techniques

Multi-query generation

Generates 3-5 alternative phrasings of the original query. Each variation may match different documents in your vector store.
from vectordb.haystack.query_enhancement.search import (
    PineconeQueryEnhancementSearchPipeline
)

pipeline = PineconeQueryEnhancementSearchPipeline(
    "configs/pinecone_triviaqa.yaml"
)

results = pipeline.run(
    query="What causes photosynthesis?",
    top_k=10
)

print(f"Found {len(results['documents'])} documents")
if "answer" in results:
    print(results["answer"])
When to use: Best for factual queries where different terminology might match different documents. Effective for domain-specific vocabulary.

HyDE (Hypothetical Document Embeddings)

Generates a hypothetical answer to the query, then searches for documents similar to that answer. This bridges the distribution gap between questions (short, interrogative) and documents (long, declarative).
from vectordb.haystack.components import QueryEnhancer

enhancer = QueryEnhancer(model="llama-3.3-70b-versatile")

# Generate hypothetical documents
hyde_docs = enhancer.generate_hypothetical_documents(
    "What is backpropagation?",
    num_docs=3
)

# Each hypothetical doc is embedded and used for retrieval
for doc in hyde_docs:
    print(doc[:100])
Output:
Backpropagation is the process by which neural networks learn...
In machine learning, backpropagation computes gradients...
The backpropagation algorithm adjusts weights to minimize error...
When to use: Best when queries are very short or when query/document distributions differ significantly (e.g., questions vs. encyclopedia articles).

Step-back prompting

Generates broader, more abstract questions that retrieve background context before answering the specific query.
from langchain_groq import ChatGroq
from vectordb.langchain.components import QueryEnhancer

llm = ChatGroq(model="llama-3.3-70b-versatile")
enhancer = QueryEnhancer(llm)

queries = enhancer.generate_queries(
    "What is backpropagation?",
    mode="step_back"
)

print(queries)
Output:
[
    "What is machine learning?",
    "How do neural networks learn?",
    "What is gradient descent?",
    "What is backpropagation?"  # Original query
]
When to use: Best for complex questions requiring background knowledge. The step-back questions retrieve context that improves answer quality.

Configuration

query_enhancement:
  type: multi_query  # or "hyde", "step_back"
  num_queries: 3
  num_hyde_docs: 3
  rrf_k: 60  # Reciprocal rank fusion parameter
  llm:
    model: llama-3.3-70b-versatile
    api_key: ${GROQ_API_KEY}

embeddings:
  model: sentence-transformers/all-MiniLM-L6-v2

pinecone:
  api_key: ${PINECONE_API_KEY}
  index_name: triviaqa
  namespace: default

rag:
  enabled: true
  model: llama-3.3-70b-versatile

How result fusion works

Each enhanced query retrieves top_k documents. Results are merged using Reciprocal Rank Fusion (RRF):
from vectordb.haystack.query_enhancement.utils.fusion import rrf_fusion_many

# Retrieve for each query variation
all_results = []
for query_variation in enhanced_queries:
    docs = vector_db.search(query_variation, top_k=10)
    all_results.append(docs)

# Fuse results using RRF
fused = rrf_fusion_many(all_results, k=60, top_k=10)
RRF scoring formula:
score(doc) = Σ 1 / (k + rank_i)
Where:
  • rank_i is the document’s rank in query result i
  • k controls the weight of lower-ranked results (default: 60)

Implementation details

Parallel execution

Haystack pipelines execute searches in parallel using ThreadPoolExecutor:
from concurrent.futures import ThreadPoolExecutor, as_completed

all_results = []
with ThreadPoolExecutor(max_workers=len(enhanced_queries)) as executor:
    futures = {
        executor.submit(self._search_single_query, q, top_k): q
        for q in enhanced_queries
    }
    
    for future in as_completed(futures):
        results = future.result()
        all_results.append(results)

Deduplication

After fusion, duplicate documents are removed based on content similarity:
from vectordb.haystack.query_enhancement.utils.fusion import (
    deduplicate_by_content
)

fused_results = rrf_fusion_many(all_results, k=60, top_k=50)
deduplicated = deduplicate_by_content(fused_results)
final_results = deduplicated[:top_k]

Cost considerations

  • LLM cost: 1 call to generate variations
  • Embedding cost: 3-5 query embeddings
  • Search cost: 3-5 vector searches
  • Total latency: ~2-3x single query (parallel execution)
  • LLM cost: 1 call to generate hypothetical docs
  • Embedding cost: 3 document embeddings
  • Search cost: 3 vector searches
  • Trade-off: Higher embedding cost (documents longer than queries)
  • LLM cost: 1 call to generate step-back questions
  • Embedding cost: 4 query embeddings
  • Search cost: 4 vector searches
  • Benefit: Retrieves broader context for complex questions

Best practices

Choose the right technique

  • Multi-query: General-purpose, works for most queries
  • HyDE: When queries are very short or domain-specific
  • Step-back: For complex questions needing background context

Tune fusion parameters

  • Lower rrf_k (30-40): Prioritizes top-ranked results
  • Higher rrf_k (80-100): Gives more weight to lower ranks
  • Default k=60 works well for most cases

Cache enhanced queries

For frequently-asked questions, cache the generated query variations to avoid repeated LLM calls.

Combine with reranking

Query enhancement increases recall (more relevant docs retrieved). Follow with reranking to improve precision.

See also

Build docs developers (and LLMs) love