LangChain integration provides retrieval chains and pipelines for building RAG applications with vector databases.
Chain Types
VectorDB provides several pre-built chain types for LangChain:
- Semantic Search: Dense vector retrieval using embedding models
- Hybrid Indexing: Combined dense and sparse vector indexing
- Sparse Indexing: BM25-style keyword-based retrieval
- MMR (Maximal Marginal Relevance): Diversity-optimized retrieval
- Parent Document Retrieval: Hierarchical chunking with parent-child relationships
- Query Enhancement: Multi-query and HyDE (Hypothetical Document Embeddings)
- Reranking: Cross-encoder reranking of retrieved results
- Contextual Compression: Token optimization through context compression
- Agentic RAG: Self-reflective retrieval with routing decisions
- Multi-tenancy: Namespace-based data isolation
- Metadata Filtering: Advanced filtering on document metadata
- JSON Indexing: Indexing and filtering on nested JSON fields
- Diversity Filtering: MMR-based diversity in retrieval
Supported Vector Databases
All LangChain chains support these vector databases:
- Chroma:
ChromaSemanticSearchPipeline, ChromaMmrSearchPipeline, etc.
- Milvus:
MilvusSemanticSearchPipeline, MilvusHybridSearchPipeline, etc.
- Pinecone:
PineconeSemanticSearchPipeline, PineconeHybridSearchPipeline, etc.
- Qdrant:
QdrantSemanticSearchPipeline, QdrantMmrSearchPipeline, etc.
- Weaviate:
WeaviateSemanticSearchPipeline, WeaviateHybridSearchPipeline, etc.
Common Chain Methods
All LangChain chains share common initialization patterns and methods:
Constructor Pattern
Pipeline(
config_path: str,
collection_name: Optional[str] = None,
embedding_model: Optional[str] = None,
**kwargs
)
Path to YAML configuration file containing database credentials and settings
Override collection name from config
Override embedding model from config (e.g., “sentence-transformers/all-MiniLM-L6-v2”)
Additional chain-specific parameters
search
Perform retrieval search and return LangChain Documents.
search(
query: str,
top_k: int = 10,
filters: Optional[Dict[str, Any]] = None,
**kwargs
) -> List[Document]
Number of results to return
Metadata filters to apply
Chain-specific search parameters
Retrieved LangChain Document objects ordered by relevance
as_retriever
Convert pipeline to LangChain Retriever interface.
as_retriever(**kwargs) -> BaseRetriever
Retriever configuration parameters
LangChain BaseRetriever instance for use in chains
Example Usage
Semantic search
from langchain_openai import OpenAIEmbeddings
from vectordb.langchain.semantic_search import ChromaSemanticSearchPipeline
# Initialize pipeline
pipeline = ChromaSemanticSearchPipeline(
config_path="config.yaml",
collection_name="my_docs",
embedding_model="text-embedding-3-small"
)
# Search
results = pipeline.search(
query="What is machine learning?",
top_k=5
)
# Use as retriever in a chain
retriever = pipeline.as_retriever()
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4"),
retriever=retriever
)
result = qa_chain.invoke({"query": "Explain quantum computing"})
Hybrid search
from vectordb.langchain.hybrid_indexing import MilvusHybridSearchPipeline
pipeline = MilvusHybridSearchPipeline(
config_path="config.yaml",
collection_name="hybrid_docs"
)
# Hybrid search combines dense and sparse vectors
results = pipeline.search(
query="quantum computing applications",
top_k=10,
ranker_type="rrf" # Reciprocal Rank Fusion
)
Agentic RAG
from langchain_groq import ChatGroq
from vectordb.langchain.agentic_rag import ChromaAgenticRAGPipeline
llm = ChatGroq(model="llama-3.3-70b-versatile")
pipeline = ChromaAgenticRAGPipeline(
config_path="config.yaml",
llm=llm
)
# Agentic search with self-reflection
result = pipeline.search(
query="Complex multi-hop question",
max_iterations=3
)
Multi-tenancy
from vectordb.langchain.multi_tenancy import PineconeMultiTenancyPipeline
pipeline = PineconeMultiTenancyPipeline(config_path="config.yaml")
# Index documents for tenant A
from langchain_core.documents import Document
documents = [
Document(page_content="Financial report Q1"),
Document(page_content="Financial report Q2")
]
pipeline.index(documents, namespace="tenant_a")
# Search within tenant A only
results = pipeline.search(
query="financial reports",
namespace="tenant_a",
top_k=5
)