Skip to main content
Reusable components for building custom LangChain RAG applications.

AgenticRouter

Route queries to search, reflect, or generate actions using LLM reasoning for agentic RAG patterns.

Constructor

AgenticRouter(llm: ChatGroq)
llm
ChatGroq
required
ChatGroq LLM instance for routing decisions. Should be configured with low temperature (0.0-0.3) for consistent routing

Methods

route

Route a query to the appropriate action based on current pipeline state.
route(
    query: str,
    has_documents: bool = False,
    current_answer: str | None = None,
    iteration: int = 1,
    max_iterations: int = 3
) -> dict[str, Any]
query
str
required
The user’s original query text
has_documents
bool
default:"False"
Indicates whether documents have already been retrieved in previous iterations
current_answer
str
The answer generated so far, if any. Used to assess whether reflection or generation is appropriate
iteration
int
default:"1"
Current iteration number (1-indexed). Used to track progress and enforce iteration limits
max_iterations
int
default:"3"
Maximum number of routing iterations allowed. Prevents infinite loops
action
str
One of ‘search’, ‘reflect’, or ‘generate’
reasoning
str
Human-readable explanation of the routing decision

ContextCompressor

Compress retrieved context using reranking or LLM-based extraction to reduce token usage.

Constructor

ContextCompressor(
    mode: str = "reranking",
    llm: ChatGroq | None = None,
    reranker: HuggingFaceCrossEncoder | None = None
)
mode
str
default:"reranking"
Compression mode: “reranking” or “llm_extraction”
llm
ChatGroq
ChatGroq instance for LLM extraction mode. Required when mode is “llm_extraction”
reranker
HuggingFaceCrossEncoder
HuggingFaceCrossEncoder instance for reranking mode. Required when mode is “reranking”

Methods

compress

Compress documents using the configured compression strategy.
compress(
    query: str,
    documents: list[Document],
    top_k: int = 5
) -> list[Document]
query
str
required
The user’s query text. Used to determine relevance
documents
list[Document]
required
List of LangChain Document objects to compress
top_k
int
default:"5"
Number of documents to return (only used in reranking mode)
compressed
list[Document]
Compressed list of documents. Structure depends on mode:
  • reranking: List of top_k Document objects, sorted by relevance
  • llm_extraction: List containing single synthesized Document

compress_reranking

Compress documents using cross-encoder reranking.
compress_reranking(
    query: str,
    documents: list[Document],
    top_k: int = 5
) -> list[Document]
query
str
required
Query text for relevance scoring
documents
list[Document]
required
Documents to rerank
top_k
int
default:"5"
Number of top documents to return
reranked
list[Document]
Top-k documents sorted by relevance score (highest first)

compress_llm_extraction

Compress documents using LLM-based passage extraction.
compress_llm_extraction(
    query: str,
    documents: list[Document]
) -> list[Document]
query
str
required
Query text to guide extraction
documents
list[Document]
required
Documents to extract from
extracted
list[Document]
List containing a single Document with extracted passages. Metadata includes ‘source’: ‘compressed’ and ‘original_doc_count’

QueryEnhancer

Enhance queries using multi-query generation, HyDE (Hypothetical Document Embeddings), and step-back techniques.

Constructor

QueryEnhancer(llm: ChatGroq)
llm
ChatGroq
required
ChatGroq LLM instance for query enhancement

Methods

generate_multi_queries

Generate multiple query variations for better retrieval coverage.
generate_multi_queries(
    query: str,
    num_queries: int = 3
) -> list[str]
query
str
required
Original query
num_queries
int
default:"3"
Number of query variations to generate
queries
list[str]
List of query variations including the original query

generate_hyde_document

Generate a hypothetical document that would answer the query.
generate_hyde_document(query: str) -> str
query
str
required
Query to generate hypothetical document for
document
str
Hypothetical document text that can be embedded and used for retrieval

generate_step_back_query

Generate a step-back query that asks a more general question.
generate_step_back_query(query: str) -> str
query
str
required
Specific query to generalize
step_back
str
More general query useful for retrieving background context

MMRHelper

Maximal Marginal Relevance utilities for diversity-optimized retrieval.

Methods

mmr_rerank

Rerank documents using MMR algorithm to balance relevance and diversity.
MMRHelper.mmr_rerank(
    documents: list[Document],
    embeddings: list[list[float]],
    query_embedding: list[float],
    k: int = 10,
    lambda_param: float = 0.5
) -> list[Document]
documents
list[Document]
required
Documents to rerank
embeddings
list[list[float]]
required
Document embeddings corresponding to documents list
query_embedding
list[float]
required
Query embedding vector
k
int
default:"10"
Number of documents to return
lambda_param
float
default:"0.5"
Balance parameter between relevance (1.0) and diversity (0.0). Default 0.5 balances both
reranked
list[Document]
Reranked documents optimized for relevance and diversity

Usage Examples

Agentic routing

from langchain_groq import ChatGroq
from vectordb.langchain.components import AgenticRouter

llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0)
router = AgenticRouter(llm)

# Initial routing - should suggest 'search'
decision = router.route(
    "What is quantum computing?",
    has_documents=False
)
print(decision)
# {'action': 'search', 'reasoning': 'No documents retrieved yet'}

# After retrieval - may suggest 'reflect' or 'generate'
decision = router.route(
    "What is quantum computing?",
    has_documents=True,
    current_answer="Quantum computing uses qubits...",
    iteration=2
)

Context compression with reranking

from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from vectordb.langchain.components import ContextCompressor

reranker = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = ContextCompressor(mode="reranking", reranker=reranker)

# Compress 10 documents down to top 3
compressed = compressor.compress(
    query="What is AI?",
    documents=retrieved_documents,
    top_k=3
)

Context compression with LLM extraction

from langchain_groq import ChatGroq
from vectordb.langchain.components import ContextCompressor

llm = ChatGroq(model="llama-3.3-70b-versatile")
compressor = ContextCompressor(mode="llm_extraction", llm=llm)

# Extract relevant passages from documents
compressed = compressor.compress(
    query="Explain neural networks",
    documents=retrieved_documents
)

Query enhancement

from langchain_groq import ChatGroq
from vectordb.langchain.components import QueryEnhancer

llm = ChatGroq(model="llama-3.3-70b-versatile")
enhancer = QueryEnhancer(llm)

# Generate multiple query variations
queries = enhancer.generate_multi_queries(
    "What are the applications of AI?",
    num_queries=3
)

# Generate hypothetical document
hyde_doc = enhancer.generate_hyde_document(
    "How does machine learning work?"
)

# Generate step-back query
step_back = enhancer.generate_step_back_query(
    "What is the training process for GPT-4?"
)
# Returns: "What are the general principles of training large language models?"

Build docs developers (and LLMs) love