Skip to main content

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by providing them with relevant context from external knowledge sources. This approach combines the power of information retrieval with generative AI to produce accurate, contextual responses.

Core RAG Architecture

Key Components

Document Processing

Load and chunk documents into manageable pieces for embedding and retrieval

Embedding Models

Convert text into vector representations for semantic similarity search

Vector Databases

Store and efficiently retrieve embedded document chunks

Language Models

Generate contextual responses using retrieved information

RAG Pipeline Stages

1. Indexing Phase

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

# Load documents
loader = PyPDFLoader("research_paper.pdf")
documents = loader.load()

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)
chunks = text_splitter.split_documents(documents)

# Create embeddings and store in vector database
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="my_knowledge_base"
)

2. Retrieval Phase

# Create retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 5}
)

# Retrieve relevant documents
query = "What are the key findings?"
relevant_docs = retriever.get_relevant_documents(query)

3. Generation Phase

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Define prompt template
prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context:
{context}

Question: {question}

Provide a detailed answer based on the context above.
""")

# Create LLM
llm = ChatOpenAI(model="gpt-4", temperature=0)

# Build RAG chain
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Query the chain
response = rag_chain.invoke(query)
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma(
    collection_name="documents",
    embedding_function=OpenAIEmbeddings(),
    persist_directory="./chroma_db"
)

Common Embedding Models

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",  # or text-embedding-3-small
    api_key="your-api-key"
)
  • Models: text-embedding-3-large, text-embedding-3-small
  • Dimensions: 1536 (large), 512 (small)
  • Best for: High-quality semantic search
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key="your-api-key"
)
  • Model: embedding-001
  • Dimensions: 768
  • Best for: Multilingual support
from langchain_cohere import CohereEmbeddings

embeddings = CohereEmbeddings(
    model="embed-english-v3.0",
    cohere_api_key="your-api-key"
)
  • Model: embed-english-v3.0
  • Best for: English text with high accuracy
from agno.knowledge.embedder.ollama import OllamaEmbedder

embeddings = OllamaEmbedder(
    model="nomic-embed-text",  # or openhermes
    host="http://localhost:11434"
)
  • Models: nomic-embed-text, openhermes
  • Best for: Privacy-focused local deployments

RAG Use Cases

Question Answering

Build intelligent Q&A systems over custom documents and knowledge bases

Document Search

Semantic search across large document collections with context

Customer Support

AI assistants that answer questions using company documentation

Research Assistant

Query and synthesize information from research papers and articles

Code Documentation

Answer questions about codebases using documentation

Legal Analysis

Search and analyze legal documents with precise citations

RAG Variants Covered

1

Basic RAG

Simple retrieval and generation pipeline with vector search
2

Agentic RAG

RAG with reasoning capabilities and tool usage
3

Advanced Techniques

Corrective RAG, hybrid search, knowledge graphs, and multi-hop reasoning
4

Local RAG

Privacy-focused implementations using Ollama and local models

Next Steps

Basic RAG

Start with fundamental RAG patterns and implementations

Agentic RAG

Learn about RAG with reasoning and autonomous capabilities

Advanced Techniques

Explore CRAG, hybrid search, and knowledge graphs

Local RAG

Build privacy-focused RAG with Ollama
Best Practice: Always evaluate your RAG system’s retrieval quality before focusing on generation. Poor retrieval cannot be fixed by better prompts.

Build docs developers (and LLMs) love