RAG Overview

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that combines the power of large language models (LLMs) with external knowledge retrieval. Instead of relying solely on the model’s training data, RAG dynamically retrieves relevant information from your documents to generate accurate, context-aware responses.

RAG solves the “hallucination” problem by grounding LLM responses in actual document content, ensuring answers are based on your specific data rather than general knowledge.

Why Use RAG?

RAG offers several key advantages:

Up-to-date Information

Query your latest documents without retraining the model

Source Attribution

Answers are grounded in retrievable document chunks

Domain Expertise

Works with specialized knowledge not in the model’s training data

Cost Effective

Cheaper than fine-tuning models on custom data

The RAG Pipeline

RAG Chat implements the classic three-step RAG pipeline:

1. Retrieval

When you ask a question, the system searches the vector store for the most relevant document chunks:

app.py

retriever = vector_store.as_retriever()

The retriever converts your question into a vector embedding and finds chunks with similar embeddings using semantic similarity search.

2. Augmentation

Retrieved chunks are injected into the prompt as context:

app.py

system_prompt = '''
Use o contexto para responder as perguntas.
Se não encontrar uma resposta no contexto,
explique que não há informações disponíveis.
Responda em formato de markdown e com visualizações
elaboradas e interativas.
Contexto: {context}
'''

The {context} placeholder is filled with the most relevant chunks from your documents.

3. Generation

The LLM generates a response based on both the question and the retrieved context:

app.py

chain = (
    {
        'context': retriever,
        'input': RunnablePassthrough()
    }
    | prompt
    | llm
)
response = chain.invoke(query)

LangChain’s LCEL (LangChain Expression Language) chains these steps together elegantly using the | operator.

Complete RAG Implementation

Here’s the full ask_question() function that orchestrates the RAG pipeline:

app.py

def ask_question(model, query, vector_store):
    llm = ChatOpenAI(model = model)
    retriever = vector_store.as_retriever()

    system_prompt = '''
    Use o contexto para responder as perguntas.
    Se não encontrar uma resposta no contexto,
    explique que não há informações disponíveis.
    Responda em formato de markdown e com visualizações
    elaboradas e interativas.
    Contexto: {context}
    '''

    messages = [('system', system_prompt)]
    for message in st.session_state.messages:
        messages.append((message.get('role'), message.get('content')))
    
    prompt = ChatPromptTemplate.from_messages(messages)
    chain = (
        {
            'context': retriever,
            'input': RunnablePassthrough()
        }
        | prompt
        | llm
    )
    response = chain.invoke(query)
    return response.content

How It Works in Practice

Example: Asking a Question

User asks: “What are the main findings in the research paper?”
Retrieval: The question is embedded and used to search the vector store
Top chunks retrieved: The 4 most relevant chunks from the paper are found
Augmentation: These chunks are inserted into the system prompt as context
Generation: GPT-4 reads the context and generates a summary of findings
Response: The user receives an answer grounded in the actual document content

Key Benefits

The RAG approach in RAG Chat provides:

Accuracy: Answers based on your actual documents, not generic knowledge
Transparency: The system can only answer based on uploaded content
Flexibility: Works with any PDF documents you upload
Conversation: Maintains chat history for multi-turn conversations
Model Choice: Switch between GPT-3.5, GPT-4, and other models

The system prompt explicitly instructs the model to say when information isn’t available in the context, preventing hallucinations.

Get Started

Core Concepts

Guides

Reference

Advanced

What is RAG?

Why Use RAG?

Up-to-date Information

Source Attribution

Domain Expertise

Cost Effective

The RAG Pipeline

1. Retrieval

2. Augmentation

3. Generation

Complete RAG Implementation

How It Works in Practice

Key Benefits

Next Steps

Vector Store

Document Processing

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Reference

Advanced

​What is RAG?

​Why Use RAG?

Up-to-date Information

Source Attribution

Domain Expertise

Cost Effective

​The RAG Pipeline

​1. Retrieval

​2. Augmentation

​3. Generation

​Complete RAG Implementation

​How It Works in Practice

​Key Benefits

​Next Steps

Vector Store

Document Processing

Build docs developers (and LLMs) love

What is RAG?

Why Use RAG?

The RAG Pipeline

1. Retrieval

2. Augmentation

3. Generation

Complete RAG Implementation

How It Works in Practice

Key Benefits

Next Steps