Skip to main content

Basic RAG Chain

A basic RAG (Retrieval-Augmented Generation) chain demonstrates the fundamental pattern of retrieving relevant documents and using them to generate contextually accurate responses.

Overview

This example shows how to build a complete RAG system called PharmaQuery - a pharmaceutical insight retrieval system that helps users gain meaningful insights from research papers.

Document Loading

Load and process PDF documents with PyPDFLoader

Text Splitting

Split documents into chunks using SentenceTransformers

Vector Storage

Store embeddings in ChromaDB for fast retrieval

Query Processing

Retrieve relevant chunks and generate answers with Gemini

Architecture

Implementation

Installation

1

Install dependencies

pip install streamlit langchain-google-genai langchain-chroma \
            langchain-community chromadb sentence-transformers \
            PyPDF2 python-dotenv
2

Set up API keys

export GOOGLE_API_KEY='your-gemini-api-key'
Get your API key from Google AI Studio

Complete Code

import streamlit as st
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import SentenceTransformersTokenTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_chroma import Chroma
from langchain.chains import RetrievalQA
import tempfile
import os

st.title("PharmaQuery - Pharmaceutical Insight Retrieval")

# Sidebar for API key and document upload
with st.sidebar:
    google_api_key = st.text_input("Google API Key", type="password")
    uploaded_files = st.file_uploader(
        "Upload Research Papers (PDF)", 
        type="pdf",
        accept_multiple_files=True
    )

if google_api_key:
    # Initialize embedding model
    embeddings = GoogleGenerativeAIEmbeddings(
        model="models/embedding-001",
        google_api_key=google_api_key
    )
    
    # Initialize vector store
    vectorstore = Chroma(
        embedding_function=embeddings,
        persist_directory="./chroma_db"
    )
    
    # Process uploaded documents
    if uploaded_files:
        for uploaded_file in uploaded_files:
            with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
                tmp_file.write(uploaded_file.read())
                tmp_path = tmp_file.name
            
            # Load and split document
            loader = PyPDFLoader(tmp_path)
            documents = loader.load()
            
            # Split into chunks
            text_splitter = SentenceTransformersTokenTextSplitter(
                chunk_size=512,
                chunk_overlap=50
            )
            chunks = text_splitter.split_documents(documents)
            
            # Add to vector store
            vectorstore.add_documents(chunks)
            
            os.unlink(tmp_path)
            st.sidebar.success(f"Processed {uploaded_file.name}")
    
    # Query interface
    query = st.text_input("Enter your query:")
    
    if query:
        # Initialize LLM
        llm = ChatGoogleGenerativeAI(
            model="gemini-1.5-pro",
            google_api_key=google_api_key,
            temperature=0.3
        )
        
        # Create retrieval chain
        qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=vectorstore.as_retriever(search_kwargs={"k": 4})
        )
        
        # Get response
        with st.spinner("Searching and generating response..."):
            response = qa_chain.invoke({"query": query})
            st.write("### Answer")
            st.write(response["result"])
            
            # Show retrieved documents
            with st.expander("View Source Documents"):
                docs = vectorstore.similarity_search(query, k=4)
                for i, doc in enumerate(docs, 1):
                    st.write(f"**Document {i}:**")
                    st.write(doc.page_content[:500] + "...")
else:
    st.warning("Please enter your Google API Key in the sidebar")

Key Components

Document Loading

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("research_paper.pdf")
documents = loader.load()
PyPDFLoader extracts text while preserving document structure and metadata.

Text Splitting

from langchain_text_splitters import SentenceTransformersTokenTextSplitter

text_splitter = SentenceTransformersTokenTextSplitter(
    chunk_size=512,      # Tokens per chunk
    chunk_overlap=50     # Overlap between chunks
)
chunks = text_splitter.split_documents(documents)
Use token-based splitting with SentenceTransformers for better semantic coherence compared to character-based splitting.

Vector Storage

from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vectorstore = Chroma(
    embedding_function=embeddings,
    persist_directory="./chroma_db"
)

# Add documents
vectorstore.add_documents(chunks)

Retrieval & Generation

from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",          # Stuff all docs into context
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4})
)

response = qa_chain.invoke({"query": "What are the side effects?"})

Usage

1

Run the application

streamlit run app.py
2

Enter API key

Paste your Google API Key in the sidebar
3

Upload documents

Upload research papers (PDFs) to build your knowledge base
4

Query the system

Ask questions about the uploaded documents

Example Queries

  • “What are the clinical trial results for this drug?”
  • “Summarize the methodology used in this study”
  • “What safety concerns were identified?”
  • “Compare efficacy across different patient groups”
  • “What statistical methods were used?”
  • “What were the inclusion/exclusion criteria?”
  • “How was the sample size determined?”
  • “What were the limitations of this study?”

Customization Options

Embedding Models

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001"
)

Vector Databases

from langchain_chroma import Chroma

vectorstore = Chroma(
    embedding_function=embeddings,
    persist_directory="./chroma_db"
)

Best Practices

  • Small chunks (256-512 tokens): Better precision, more retrievals needed
  • Medium chunks (512-1024 tokens): Balanced approach (recommended)
  • Large chunks (1024-2048 tokens): More context, but less precise
Adjust based on your document type and query patterns.
  • Use 10-20% overlap to maintain context across chunks
  • Too much overlap increases storage and redundancy
  • Too little overlap may lose important context at boundaries
retriever = vectorstore.as_retriever(
    search_type="similarity",    # or "mmr" for diversity
    search_kwargs={
        "k": 4,                   # Number of documents to retrieve
        "score_threshold": 0.7    # Minimum similarity score
    }
)

Troubleshooting

Solutions:
  • Increase k value to retrieve more documents
  • Lower score_threshold to allow less similar matches
  • Improve document chunking strategy
  • Use query expansion or reformulation
Solutions:
  • Reduce chunk size to decrease embedding time
  • Use a faster embedding model
  • Enable vector store caching
  • Consider using a more efficient vector database
Solutions:
  • Process documents in batches
  • Use a disk-based vector store (ChromaDB, Qdrant)
  • Reduce embedding dimensions if possible
  • Clean up temporary files after processing

Agentic RAG

Add reasoning and self-correction to your RAG system

Corrective RAG

Implement self-evaluating retrieval with fallback strategies

Hybrid Search

Combine vector search with keyword search for better results

Local RAG

Build a privacy-focused RAG system with local models

Build docs developers (and LLMs) love