Why query enhancement matters
A single query may miss relevant documents because:- Users phrase questions differently than document content
- Important terms have synonyms or alternative phrasings
- Specific queries miss broader context needed for complex answers
Techniques
Multi-query generation
Generates 3-5 alternative phrasings of the original query. Each variation may match different documents in your vector store.HyDE (Hypothetical Document Embeddings)
Generates a hypothetical answer to the query, then searches for documents similar to that answer. This bridges the distribution gap between questions (short, interrogative) and documents (long, declarative).Step-back prompting
Generates broader, more abstract questions that retrieve background context before answering the specific query.Configuration
How result fusion works
Each enhanced query retrieves top_k documents. Results are merged using Reciprocal Rank Fusion (RRF):rank_iis the document’s rank in query result ikcontrols the weight of lower-ranked results (default: 60)
Implementation details
Parallel execution
Haystack pipelines execute searches in parallel using ThreadPoolExecutor:Deduplication
After fusion, duplicate documents are removed based on content similarity:Cost considerations
Multi-query
Multi-query
- LLM cost: 1 call to generate variations
- Embedding cost: 3-5 query embeddings
- Search cost: 3-5 vector searches
- Total latency: ~2-3x single query (parallel execution)
HyDE
HyDE
- LLM cost: 1 call to generate hypothetical docs
- Embedding cost: 3 document embeddings
- Search cost: 3 vector searches
- Trade-off: Higher embedding cost (documents longer than queries)
Step-back
Step-back
- LLM cost: 1 call to generate step-back questions
- Embedding cost: 4 query embeddings
- Search cost: 4 vector searches
- Benefit: Retrieves broader context for complex questions
Best practices
Choose the right technique
- Multi-query: General-purpose, works for most queries
- HyDE: When queries are very short or domain-specific
- Step-back: For complex questions needing background context
Tune fusion parameters
- Lower
rrf_k(30-40): Prioritizes top-ranked results - Higher
rrf_k(80-100): Gives more weight to lower ranks - Default
k=60works well for most cases
Cache enhanced queries
For frequently-asked questions, cache the generated query variations to avoid repeated LLM calls.
Combine with reranking
Query enhancement increases recall (more relevant docs retrieved). Follow with reranking to improve precision.
See also
- Contextual compression - Reduce retrieved context before generation
- Reranking - Two-stage retrieval for higher precision
- Hybrid search - Combine dense and sparse retrieval