Skip to main content
Reusable components for building custom Haystack RAG pipelines.

AgenticRouter

Route and orchestrate RAG with agent-like behavior for tool selection and self-reflection.

Constructor

AgenticRouter(
    model: str = "llama-3.3-70b-versatile",
    api_key: str | None = None,
    api_base_url: str = "https://api.groq.com/openai/v1"
)
model
str
default:"llama-3.3-70b-versatile"
LLM model name for routing decisions
api_key
str
API key for the LLM provider. Falls back to GROQ_API_KEY environment variable
api_base_url
str
default:"https://api.groq.com/openai/v1"
Base URL for OpenAI-compatible API endpoint

Methods

select_tool

Select the best tool for a given query.
select_tool(query: str) -> str
query
str
required
User query to route
tool
str
Selected tool name: “retrieval”, “web_search”, “calculation”, or “reasoning”

evaluate_answer_quality

Evaluate generated answer quality with relevance, completeness, and grounding scores.
evaluate_answer_quality(
    query: str,
    answer: str,
    context: str = ""
) -> dict[str, Any]
query
str
required
Original user query
answer
str
required
Generated answer to evaluate
context
str
default:""
Retrieved context used to generate the answer (for grounding check)
evaluation
dict[str, Any]
Dictionary containing:
  • relevance (0-100): Does it answer the query?
  • completeness (0-100): Is it sufficiently detailed?
  • grounding (0-100): Is it grounded in the context?
  • issues (list): List of identified problems (max 3)
  • suggestions (list): List of improvement suggestions (max 2)

self_reflect_loop

Run self-reflection loop to iteratively improve answer quality.
self_reflect_loop(
    query: str,
    answer: str,
    context: str = "",
    max_iterations: int = 2,
    quality_threshold: int = 75
) -> str
query
str
required
Original user query
answer
str
required
Initial answer to refine
context
str
default:""
Retrieved context for grounding
max_iterations
int
default:"2"
Maximum refinement iterations
quality_threshold
int
default:"75"
Target quality score (0-100) to stop refinement early
refined_answer
str
Final refined answer after iterative improvement

ContextCompressor

Compress and summarize retrieved context to reduce token usage and improve answer quality.

Constructor

ContextCompressor(
    model: str = "llama-3.3-70b-versatile",
    api_key: str | None = None
)
model
str
default:"llama-3.3-70b-versatile"
LLM model for compression (Groq API)
api_key
str
Groq API key. Falls back to GROQ_API_KEY environment variable

Methods

compress

Compress context using specified compression technique.
compress(
    context: str,
    query: str,
    compression_type: str = "abstractive",
    **kwargs: Any
) -> str
context
str
required
Retrieved context to compress
query
str
required
Original query for relevance filtering
compression_type
str
default:"abstractive"
Compression technique: “abstractive”, “extractive”, or “relevance_filter”
**kwargs
Any
Additional parameters:
  • max_tokens (int): For abstractive compression (default: 2048)
  • num_sentences (int): For extractive compression (default: 5)
  • relevance_threshold (float): For relevance filtering (default: 0.5)
compressed
str
Compressed context text

compress_abstractive

Abstractive compression using LLM summarization.
compress_abstractive(
    context: str,
    query: str,
    max_tokens: int = 2048
) -> str
context
str
required
Context to compress
query
str
required
Query for relevance
max_tokens
int
default:"2048"
Maximum tokens in compressed output
summary
str
Compressed context summary

compress_extractive

Extractive compression by selecting key sentences.
compress_extractive(
    context: str,
    query: str,
    num_sentences: int = 5
) -> str
context
str
required
Context to compress
query
str
required
Query for relevance
num_sentences
int
default:"5"
Number of sentences to extract
selected
str
Selected sentences joined together

filter_by_relevance

Filter context chunks by relevance threshold.
filter_by_relevance(
    context: str,
    query: str,
    relevance_threshold: float = 0.5
) -> str
context
str
required
Context to filter
query
str
required
Query for relevance scoring
relevance_threshold
float
default:"0.5"
Minimum relevance score (0-1) to keep chunks
filtered
str
Filtered context containing only relevant chunks

QueryEnhancer

Enhance queries using multi-query generation and query expansion techniques.

Constructor

QueryEnhancer(
    model: str = "llama-3.3-70b-versatile",
    api_key: str | None = None
)
model
str
default:"llama-3.3-70b-versatile"
LLM model for query enhancement
api_key
str
API key. Falls back to GROQ_API_KEY environment variable

Methods

generate_multi_queries

Generate multiple query variations for better retrieval coverage.
generate_multi_queries(
    query: str,
    num_queries: int = 3
) -> list[str]
query
str
required
Original query
num_queries
int
default:"3"
Number of query variations to generate
queries
list[str]
List of query variations including the original

ResultMerger

Merge and deduplicate results from multiple retrieval sources.

Methods

merge_results

Merge document results using specified strategy.
merge_results(
    results_list: list[list[Document]],
    strategy: str = "rrf",
    top_k: int = 10
) -> list[Document]
results_list
list[list[Document]]
required
List of result lists from different sources
strategy
str
default:"rrf"
Merging strategy: “rrf” (Reciprocal Rank Fusion) or “interleave”
top_k
int
default:"10"
Number of results to return after merging
merged
list[Document]
Merged and deduplicated document list

Build docs developers (and LLMs) love