Basic RAG Chain
A basic RAG (Retrieval-Augmented Generation) chain demonstrates the fundamental pattern of retrieving relevant documents and using them to generate contextually accurate responses.
Overview
This example shows how to build a complete RAG system called PharmaQuery - a pharmaceutical insight retrieval system that helps users gain meaningful insights from research papers.
Document Loading Load and process PDF documents with PyPDFLoader
Text Splitting Split documents into chunks using SentenceTransformers
Vector Storage Store embeddings in ChromaDB for fast retrieval
Query Processing Retrieve relevant chunks and generate answers with Gemini
Architecture
Implementation
Installation
Install dependencies
pip install streamlit langchain-google-genai langchain-chroma \
langchain-community chromadb sentence-transformers \
PyPDF2 python-dotenv
Set up API keys
export GOOGLE_API_KEY = 'your-gemini-api-key'
Get your API key from Google AI Studio
Complete Code
import streamlit as st
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import SentenceTransformersTokenTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_chroma import Chroma
from langchain.chains import RetrievalQA
import tempfile
import os
st.title( "PharmaQuery - Pharmaceutical Insight Retrieval" )
# Sidebar for API key and document upload
with st.sidebar:
google_api_key = st.text_input( "Google API Key" , type = "password" )
uploaded_files = st.file_uploader(
"Upload Research Papers (PDF)" ,
type = "pdf" ,
accept_multiple_files = True
)
if google_api_key:
# Initialize embedding model
embeddings = GoogleGenerativeAIEmbeddings(
model = "models/embedding-001" ,
google_api_key = google_api_key
)
# Initialize vector store
vectorstore = Chroma(
embedding_function = embeddings,
persist_directory = "./chroma_db"
)
# Process uploaded documents
if uploaded_files:
for uploaded_file in uploaded_files:
with tempfile.NamedTemporaryFile( delete = False , suffix = ".pdf" ) as tmp_file:
tmp_file.write(uploaded_file.read())
tmp_path = tmp_file.name
# Load and split document
loader = PyPDFLoader(tmp_path)
documents = loader.load()
# Split into chunks
text_splitter = SentenceTransformersTokenTextSplitter(
chunk_size = 512 ,
chunk_overlap = 50
)
chunks = text_splitter.split_documents(documents)
# Add to vector store
vectorstore.add_documents(chunks)
os.unlink(tmp_path)
st.sidebar.success( f "Processed { uploaded_file.name } " )
# Query interface
query = st.text_input( "Enter your query:" )
if query:
# Initialize LLM
llm = ChatGoogleGenerativeAI(
model = "gemini-1.5-pro" ,
google_api_key = google_api_key,
temperature = 0.3
)
# Create retrieval chain
qa_chain = RetrievalQA.from_chain_type(
llm = llm,
chain_type = "stuff" ,
retriever = vectorstore.as_retriever( search_kwargs = { "k" : 4 })
)
# Get response
with st.spinner( "Searching and generating response..." ):
response = qa_chain.invoke({ "query" : query})
st.write( "### Answer" )
st.write(response[ "result" ])
# Show retrieved documents
with st.expander( "View Source Documents" ):
docs = vectorstore.similarity_search(query, k = 4 )
for i, doc in enumerate (docs, 1 ):
st.write( f "**Document { i } :**" )
st.write(doc.page_content[: 500 ] + "..." )
else :
st.warning( "Please enter your Google API Key in the sidebar" )
streamlit
langchain-google-genai
langchain-chroma
langchain-community
langchain-core
chromadb
sentence-transformers
PyPDF2
python-dotenv
Key Components
Document Loading
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader( "research_paper.pdf" )
documents = loader.load()
PyPDFLoader extracts text while preserving document structure and metadata.
Text Splitting
from langchain_text_splitters import SentenceTransformersTokenTextSplitter
text_splitter = SentenceTransformersTokenTextSplitter(
chunk_size = 512 , # Tokens per chunk
chunk_overlap = 50 # Overlap between chunks
)
chunks = text_splitter.split_documents(documents)
Use token-based splitting with SentenceTransformers for better semantic coherence compared to character-based splitting.
Vector Storage
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings( model = "models/embedding-001" )
vectorstore = Chroma(
embedding_function = embeddings,
persist_directory = "./chroma_db"
)
# Add documents
vectorstore.add_documents(chunks)
Retrieval & Generation
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm = llm,
chain_type = "stuff" , # Stuff all docs into context
retriever = vectorstore.as_retriever( search_kwargs = { "k" : 4 })
)
response = qa_chain.invoke({ "query" : "What are the side effects?" })
Usage
Enter API key
Paste your Google API Key in the sidebar
Upload documents
Upload research papers (PDFs) to build your knowledge base
Query the system
Ask questions about the uploaded documents
Example Queries
“What are the clinical trial results for this drug?”
“Summarize the methodology used in this study”
“What safety concerns were identified?”
“Compare efficacy across different patient groups”
“What statistical methods were used?”
“What were the inclusion/exclusion criteria?”
“How was the sample size determined?”
“What were the limitations of this study?”
Customization Options
Embedding Models
Google Gemini
OpenAI
HuggingFace (Local)
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(
model = "models/embedding-001"
)
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model = "text-embedding-3-small"
)
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name = "sentence-transformers/all-MiniLM-L6-v2"
)
Vector Databases
from langchain_chroma import Chroma
vectorstore = Chroma(
embedding_function = embeddings,
persist_directory = "./chroma_db"
)
from langchain_qdrant import Qdrant
from qdrant_client import QdrantClient
client = QdrantClient( path = "./qdrant_db" )
vectorstore = Qdrant(
client = client,
collection_name = "documents" ,
embeddings = embeddings
)
from langchain_community.vectorstores import FAISS
vectorstore = FAISS .from_documents(chunks, embeddings)
vectorstore.save_local( "./faiss_index" )
Best Practices
Small chunks (256-512 tokens) : Better precision, more retrievals needed
Medium chunks (512-1024 tokens) : Balanced approach (recommended)
Large chunks (1024-2048 tokens) : More context, but less precise
Adjust based on your document type and query patterns.
Use 10-20% overlap to maintain context across chunks
Too much overlap increases storage and redundancy
Too little overlap may lose important context at boundaries
retriever = vectorstore.as_retriever(
search_type = "similarity" , # or "mmr" for diversity
search_kwargs = {
"k" : 4 , # Number of documents to retrieve
"score_threshold" : 0.7 # Minimum similarity score
}
)
Troubleshooting
Empty or irrelevant responses
Solutions:
Increase k value to retrieve more documents
Lower score_threshold to allow less similar matches
Improve document chunking strategy
Use query expansion or reformulation
Solutions:
Process documents in batches
Use a disk-based vector store (ChromaDB, Qdrant)
Reduce embedding dimensions if possible
Clean up temporary files after processing
Agentic RAG Add reasoning and self-correction to your RAG system
Corrective RAG Implement self-evaluating retrieval with fallback strategies
Hybrid Search Combine vector search with keyword search for better results
Local RAG Build a privacy-focused RAG system with local models