What is RAG?
RAG works in three steps:- Index: Embed documents and store them in a vector database
- Retrieve: Find relevant documents based on a query
- Generate: Use retrieved context to generate an answer
Building a Retriever
Retrievers implement the interface for finding relevant documents:From Vector Store
The most common pattern is creating a retriever from a vector store:Search Types
- Similarity
- MMR (Maximal Marginal Relevance)
- Similarity with Threshold
Default search - returns most similar documents:
Custom Retriever
Implement custom retrieval logic:RAG Chain with LCEL
Use LangChain Expression Language to build RAG chains:Multi-Query Retrieval
Generate multiple search queries for better recall:Contextual Compression
Compress retrieved documents to keep only relevant parts:Metadata Filtering
Filter retrieval by metadata:Parent Document Retrieval
Retrieve small chunks but return full parent documents:Async Retrieval
Use async for parallel retrieval:Hybrid Search
Combine semantic and keyword search:Best Practices
Chunk documents appropriately
Chunk size affects retrieval quality. Test 500-1000 characters with 100-200 character overlap.
Consider multi-query retrieval
Generate alternative queries to improve recall for complex questions.
Common Patterns
- Question Answering: Retrieve docs and generate answers
- Chatbots: Add conversation history to retrieval context
- Summarization: Retrieve related docs before summarizing
- Citation: Return source documents with generated answers
Next Steps
- Learn about Embeddings for vector search
- Explore Chat Models for generation
- Check out Vector Store integrations
- Build production RAG with LangSmith