What you get
- Seventeen retrieval and RAG patterns implemented using LangChain’s retriever, chain, and document store abstractions
- Full portability across Pinecone, Weaviate, Chroma, Milvus, and Qdrant with feature-specific notes on backend support
- YAML-driven configuration with environment variable substitution so credentials stay out of code
- Evaluation support via shared
utils/evaluation.pymetrics - Shared reusable components (
components/) and helper factories (utils/) that all feature pipelines draw from
Vector database support
All pipelines support five backends:- Pinecone: Managed vector database with hybrid search capabilities
- Weaviate: Open-source vector search with schema-based filtering
- Chroma: Embedded database for prototyping and local use
- Milvus: Scalable vector search with partition support
- Qdrant: High-performance search with payload indexing
Module structure
Each feature directory follows the same layout:EmbedderHelper, and upsert them into the target backend. Search scripts embed a query, retrieve candidates, apply post-retrieval processing, and optionally generate an answer using RAGHelper.
Feature catalog
Semantic search
Dense vector similarity search with HuggingFace embeddings
Hybrid search
Combine dense and sparse embeddings with Reciprocal Rank Fusion
Reranking
Two-stage retrieval with HuggingFace cross-encoder models
MMR diversity
Maximal Marginal Relevance for balancing relevance and diversity
Query enhancement
Multi-query, HyDE, and step-back prompting for better recall
Contextual compression
Compress retrieved documents to query-relevant fragments
Agentic RAG
Multi-step iterative RAG with reflection and routing
Metadata filtering
Structured filter constraints applied at query time
Multi-tenancy
Tenant-scoped indexing and retrieval with isolation
Namespaces
Logical data partitioning within shared indexes
Parent document retrieval
Index child chunks, return parent documents
JSON indexing
Structured fields from JSON preserved as metadata
Embedding configuration
All LangChain feature pipelines read embedding configuration from YAML:RAG configuration
Generation is controlled by therag section:
RAGHelper uses ChatGroq for generation. Set enabled: false to run retrieval-only pipelines.
Recommended onboarding path
Baseline semantic search
Run
semantic_search on your target backend with a small dataset limit and verify the pipeline completes successfully.Measure baseline metrics
Extract evaluation queries from the dataset and measure baseline retrieval metrics.
Add improvements
Add one improvement feature at a time — start with
reranking (usually the highest single-step gain) or hybrid_indexing (for mixed query types).Feature selection guide
| If you need… | Use |
|---|---|
| Starting point and baseline | semantic_search |
| Both semantic and keyword precision | hybrid_indexing |
| Pure keyword/lexical precision | sparse_indexing |
| Better final ranking | reranking |
| Relevant + diverse result set | mmr |
| Less redundant context | diversity_filtering |
| Structured constraints | metadata_filtering |
| JSON-native documents | json_indexing |
| Better query recall | query_enhancement |
| Shorter, cleaner context | contextual_compression |
| Token/cost budget control | cost_optimized_rag |
| Iterative multi-step reasoning | agentic_rag |
| Long docs with fragment search | parent_document_retrieval |
| Per-customer data isolation | multi_tenancy |
| Logical data segmentation | namespaces |
LangChain vs Haystack
Both frameworks provide similar capabilities but with different design philosophies:Abstraction style
Abstraction style
LangChain: Component-oriented with chains and runnables. Uses
Document objects and retriever interfaces.Haystack: Pipeline-oriented with nodes and pipelines. Uses Document objects and node interfaces.LLM integration
LLM integration
LangChain: Native integration with ChatGroq, OpenAI, Anthropic via langchain-* packages.Haystack: Integration via generator nodes and prompt builders.
Embedding models
Embedding models
LangChain:
HuggingFaceEmbeddings from langchain-huggingface.Haystack: SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder.Hybrid search
Hybrid search
LangChain: Manual fusion with
ResultMerger using Reciprocal Rank Fusion.Haystack: Built-in support with DocumentJoiner and RRF ranker.Choose LangChain if you prefer chain composition and already use LangChain in your stack. Choose Haystack for pipeline-based workflows and deeper integration with Hugging Face models.
Next steps
Semantic search
Start with baseline dense vector retrieval
Hybrid search
Combine dense and sparse for robust retrieval
Components
Explore reusable LangChain components
Chains
Build agentic RAG with routing and reflection