RAG Architecture

What is Retrieval-Augmented Generation?

Retrieval-Augmented Generation (RAG) is an AI architecture that combines the power of large language models (LLMs) with external knowledge retrieval. Instead of relying solely on the model’s training data, RAG systems:

Retrieve relevant information from a knowledge base
Augment the user’s query with retrieved context
Generate accurate responses based on both the query and retrieved information

RAG is particularly valuable for domain-specific applications where accuracy and source attribution are critical, such as Islamic knowledge systems.

How DeenPAL Implements RAG

DeenPAL uses RAG to provide accurate Hadith-based responses by ensuring every answer is grounded in authentic Islamic sources (Sahih Al-Bukhari and Sahih Al-Muslim). This approach prevents the LLM from generating unsupported information. The system is designed to:

Deliver personalized responses tailored to user queries
Utilize reliable sources from trusted Hadith collections
Provide citations with book numbers, Hadith numbers, and chapters
Offer explanations that connect retrieved Hadiths to user questions

The RAG Pipeline

DeenPAL’s RAG architecture follows a four-stage pipeline:

1. Data Loading

Hadith PDFs are loaded from the data/ directory and processed:

# From loader.py
folder_path = "data/"
loader = PyPDFDirectoryLoader(folder_path)
documents = loader.load()

Metadata is extracted and structured to identify sources:

for doc in documents:
    split_source = (doc.metadata['source'].split("/")[-1])
    exact_source_with_ext = split_source.split('_', maxsplit=1)[1]
    exact_source = exact_source_with_ext.split('.')[0]
    doc.metadata = {'source': exact_source}

2. Text Splitting & Embedding

Documents are split into semantic chunks based on Hadith structure:

# From loader.py
pattern = r"(?:Chapter\s\d+:)|(?:Book\s\d+,\sNumber\s\d+:)"
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=0, 
    separators=[pattern], 
    is_separator_regex=True
)
chunks = text_splitter.split_documents(documents)

DeenPAL uses regex-based splitting to preserve each Hadith as a complete semantic unit, rather than arbitrary character-based chunks. This maintains the integrity of each Hadith.

Embeddings are generated using a lightweight sentence transformer model:

embeddings = HuggingFaceEmbeddings(
    model_name='sentence-transformers/all-MiniLM-L6-v2'
)

3. Vector Storage & Retrieval

Embedded chunks are stored in ChromaDB for efficient similarity search:

# From loader.py
persist_directory = 'database/chroma_db'
db = Chroma.from_documents(
    documents=chunks, 
    embedding=embeddings, 
    persist_directory=persist_directory
)

When a user asks a question, the retriever finds the most relevant Hadiths using Maximal Marginal Relevance (MMR):

# From chains.py
retriever = db.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 4, "fetch_k": 10}
)

4. Response Generation

The LLM receives the user’s question along with retrieved Hadith context:

# From chains.py
llm = ChatOpenAI(
    model="deepseek/deepseek-chat-v3-0324:free",
    base_url="https://openrouter.ai/api/v1"
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

Benefits of RAG for Hadith Retrieval

1. Accuracy & Attribution

Every response is backed by authentic Hadith sources with proper citations, preventing hallucinations.

2. Up-to-date Knowledge

The system can be updated with new sources without retraining the LLM.

3. Transparency

Users can verify responses by checking the cited Hadith sources.

4. Semantic Understanding

The embedding model understands meaning, not just keywords, allowing for natural language queries.

5. Scalability

New Hadith collections can be added to the vector store without modifying the core system.

DeenPAL uses caching with @st.cache_resource to ensure data loading happens only once per session, preventing redundant processing and improving response times.

Architecture Diagram

┌─────────────────┐
│  User Query     │
│  "What does     │
│  Islam say..."  │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────────────────┐
│           RETRIEVAL PHASE                       │
│                                                 │
│  1. Query → Embedding Model                     │
│     (sentence-transformers/all-MiniLM-L6-v2)    │
│                                                 │
│  2. Vector Search in ChromaDB                   │
│     (MMR: k=4 from fetch_k=10)                  │
│                                                 │
│  3. Retrieve Top 4 Diverse Hadiths              │
└────────┬────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────┐
│           GENERATION PHASE                      │
│                                                 │
│  1. Inject Retrieved Context into Prompt        │
│                                                 │
│  2. LLM Processes Query + Context               │
│     (DeepSeek Chat via OpenRouter)              │
│                                                 │
│  3. Generate Response with Citations            │
└────────┬────────────────────────────────────────┘
         │
         ▼
┌─────────────────┐
│   Response:     │
│   • Hadiths     │
│   • Explanation │
│   • Answer      │
└─────────────────┘

This architecture ensures that every response is grounded in authentic Islamic sources while leveraging the natural language understanding capabilities of modern LLMs.

Get Started

Core Concepts

Guides

Components

RAG Architecture

What is Retrieval-Augmented Generation?

How DeenPAL Implements RAG

The RAG Pipeline

1. Data Loading

2. Text Splitting & Embedding

3. Vector Storage & Retrieval

4. Response Generation

Benefits of RAG for Hadith Retrieval

1. Accuracy & Attribution

2. Up-to-date Knowledge

3. Transparency

4. Semantic Understanding

5. Scalability

Architecture Diagram

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Components

​What is Retrieval-Augmented Generation?

​How DeenPAL Implements RAG

​The RAG Pipeline

​1. Data Loading

​2. Text Splitting & Embedding

​3. Vector Storage & Retrieval

​4. Response Generation

​Benefits of RAG for Hadith Retrieval

​1. Accuracy & Attribution

​2. Up-to-date Knowledge

​3. Transparency

​4. Semantic Understanding

​5. Scalability

​Architecture Diagram

Build docs developers (and LLMs) love

What is Retrieval-Augmented Generation?

How DeenPAL Implements RAG

The RAG Pipeline

1. Data Loading

2. Text Splitting & Embedding

3. Vector Storage & Retrieval

4. Response Generation

Benefits of RAG for Hadith Retrieval

1. Accuracy & Attribution

2. Up-to-date Knowledge

3. Transparency

4. Semantic Understanding

5. Scalability

Architecture Diagram