RAG Applications Overview

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by providing them with relevant context from external knowledge sources. This approach combines the power of information retrieval with generative AI to produce accurate, contextual responses.

Core RAG Architecture

Key Components

Document Processing

Load and chunk documents into manageable pieces for embedding and retrieval

Embedding Models

Convert text into vector representations for semantic similarity search

Vector Databases

Store and efficiently retrieve embedded document chunks

Language Models

Generate contextual responses using retrieved information

RAG Pipeline Stages

1. Indexing Phase

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

# Load documents
loader = PyPDFLoader("research_paper.pdf")
documents = loader.load()

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)
chunks = text_splitter.split_documents(documents)

# Create embeddings and store in vector database
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="my_knowledge_base"
)

2. Retrieval Phase

# Create retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 5}
)

# Retrieve relevant documents
query = "What are the key findings?"
relevant_docs = retriever.get_relevant_documents(query)

3. Generation Phase

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Define prompt template
prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context:
{context}

Question: {question}

Provide a detailed answer based on the context above.
""")

# Create LLM
llm = ChatOpenAI(model="gpt-4", temperature=0)

# Build RAG chain
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Query the chain
response = rag_chain.invoke(query)

Popular Vector Databases

Chroma
Qdrant
LanceDB
PgVector

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma(
    collection_name="documents",
    embedding_function=OpenAIEmbeddings(),
    persist_directory="./chroma_db"
)

from langchain_community.vectorstores import Qdrant
from qdrant_client import QdrantClient

client = QdrantClient(
    url="http://localhost:6333",
    api_key="your-api-key"
)

vectorstore = Qdrant(
    client=client,
    collection_name="documents",
    embeddings=OpenAIEmbeddings()
)

from agno.vectordb.lancedb import LanceDb, SearchType
from agno.knowledge.embedder.openai import OpenAIEmbedder

vector_db = LanceDb(
    uri="tmp/lancedb",
    table_name="knowledge_base",
    search_type=SearchType.vector,
    embedder=OpenAIEmbedder(api_key=api_key)
)

from langchain_postgres import PGVector

connection_string = "postgresql://user:pass@localhost:5432/db"

vectorstore = PGVector(
    collection_name="documents",
    connection_string=connection_string,
    embedding_function=OpenAIEmbeddings()
)

Common Embedding Models

OpenAI Embeddings

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",  # or text-embedding-3-small
    api_key="your-api-key"
)

Models: text-embedding-3-large, text-embedding-3-small
Dimensions: 1536 (large), 512 (small)
Best for: High-quality semantic search

Google Gemini Embeddings

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key="your-api-key"
)

Model: embedding-001
Dimensions: 768
Best for: Multilingual support

Cohere Embeddings

from langchain_cohere import CohereEmbeddings

embeddings = CohereEmbeddings(
    model="embed-english-v3.0",
    cohere_api_key="your-api-key"
)

Model: embed-english-v3.0
Best for: English text with high accuracy

Local Embeddings (Ollama)

from agno.knowledge.embedder.ollama import OllamaEmbedder

embeddings = OllamaEmbedder(
    model="nomic-embed-text",  # or openhermes
    host="http://localhost:11434"
)

Models: nomic-embed-text, openhermes
Best for: Privacy-focused local deployments

RAG Use Cases

Question Answering

Build intelligent Q&A systems over custom documents and knowledge bases

Document Search

Semantic search across large document collections with context

Customer Support

AI assistants that answer questions using company documentation

Research Assistant

Query and synthesize information from research papers and articles

Code Documentation

Answer questions about codebases using documentation

Legal Analysis

Search and analyze legal documents with precise citations

RAG Variants Covered

Basic RAG

Simple retrieval and generation pipeline with vector search

Agentic RAG

RAG with reasoning capabilities and tool usage

Advanced Techniques

Corrective RAG, hybrid search, knowledge graphs, and multi-hop reasoning

Local RAG

Privacy-focused implementations using Ollama and local models

Next Steps

Basic RAG

Start with fundamental RAG patterns and implementations

Agentic RAG

Learn about RAG with reasoning and autonomous capabilities

Advanced Techniques

Explore CRAG, hybrid search, and knowledge graphs

Local RAG

Build privacy-focused RAG with Ollama

Best Practice: Always evaluate your RAG system’s retrieval quality before focusing on generation. Poor retrieval cannot be fixed by better prompts.

Get Started

AI Agents

RAG Applications

Advanced Concepts

Agent Skills

Framework Guides

What is RAG?

Core RAG Architecture

Key Components

Document Processing

Embedding Models

Vector Databases

Language Models

RAG Pipeline Stages

1. Indexing Phase

2. Retrieval Phase

3. Generation Phase

Popular Vector Databases

Common Embedding Models

RAG Use Cases

Question Answering

Document Search

Customer Support

Research Assistant

Code Documentation

Legal Analysis

RAG Variants Covered

Next Steps

Basic RAG

Agentic RAG

Advanced Techniques

Local RAG

Build docs developers (and LLMs) love

Get Started

AI Agents

RAG Applications

Advanced Concepts

Agent Skills

Framework Guides

​What is RAG?

​Core RAG Architecture

​Key Components

Document Processing

Embedding Models

Vector Databases

Language Models

​RAG Pipeline Stages

​1. Indexing Phase

​2. Retrieval Phase

​3. Generation Phase

​Popular Vector Databases

​Common Embedding Models

​RAG Use Cases

Question Answering

Document Search

Customer Support

Research Assistant

Code Documentation

Legal Analysis

​RAG Variants Covered

​Next Steps

Basic RAG

Agentic RAG

Advanced Techniques

Local RAG

Build docs developers (and LLMs) love

What is RAG?

Core RAG Architecture

Key Components

RAG Pipeline Stages

1. Indexing Phase

2. Retrieval Phase

3. Generation Phase

Popular Vector Databases

Common Embedding Models

RAG Use Cases

RAG Variants Covered

Next Steps