Skip to main content

Prerequisites

Before you begin, ensure you have the following installed:
  • Python 3.10 or higher
  • Conda (Anaconda or Miniconda)
  • Git for cloning the repository
  • DeepSeek API Key (or any OpenAI-compatible LLM API key)
DeenPAL uses the DeepSeek model via OpenRouter API. You can sign up for a free API key at OpenRouter.

Installation

Follow these steps to install and set up DeenPAL:
1

Clone the repository

Clone the DeenPAL repository from GitHub:
git clone https://github.com/Raza-Aziz/DeenPAL-RAG-based-Islamic-Hadith-Chatbot.git
cd DeenPAL-RAG-based-Islamic-Hadith-Chatbot
2

Set up a virtual environment

Create and activate a new conda environment with Python 3.10:
conda create -n deen-pal python=3.10 -y
conda activate deen-pal
Using a virtual environment ensures dependency isolation and prevents conflicts with other Python projects.
3

Configure environment variables

Create a .env file in the root directory and add your OpenRouter API key:
.env
OPENAI_API_KEY="your_openrouter_api_key_here"
DeenPAL uses OpenRouter to access the DeepSeek model. Get your API key from OpenRouter.
Keep your API key secure and never commit the .env file to version control. Add it to .gitignore if not already included.
4

Install dependencies

Install the required Python packages. You can use either pip or uv package manager:
pip install -r colab_requirements.txt
The uv package manager is significantly faster than pip for dependency installation. If you’re installing frequently, consider using uv.
5

Prepare your data

Place your Hadith documents in the data/ directory in PDF format:
mkdir -p data/
# Copy your Hadith PDF files into the data/ directory
The original implementation uses Sahih Muslim and Sahih Bukhari books (all volumes) as the data source. You can use any Hadith PDF files that follow a similar structure with chapter and book numbering.
Expected PDF naming format:
  • <prefix>_Sahih_Bukhari_Volume_1.pdf
  • <prefix>_Sahih_Muslim_Volume_1.pdf

Running DeenPAL

Once installation is complete, you can start the chatbot:
1

Start the Streamlit application

Run the following command in your terminal:
streamlit run app.py
The first run will take longer as the system loads PDFs, generates embeddings, and initializes the ChromaDB vector store. Subsequent runs will be much faster due to caching.
2

Access the chatbot interface

Open your web browser and navigate to:
http://localhost:8501
Streamlit runs on port 8501 by default. If you need to use a different port, run streamlit run app.py --server.port 8080.
You should see the Deen Pal Chatbot interface.
3

Ask your first question

Try asking a question in the chat input at the bottom of the page. For example:

Example Query

“What does the Hadith say about prayer?”
The chatbot will:
  1. Retrieve relevant Hadiths from the database
  2. Display each Hadith with source citations (book number, hadith number, chapter)
  3. Provide a brief explanation for each Hadith
  4. Generate a concise answer to your question

Understanding the First Run

During the first execution, DeenPAL performs several initialization steps:
# From loader.py - cached for performance
@st.cache_resource
def load_and_prepare_data():
    # 1. Loading Hadith PDFs from data/ directory
    # 2. Processing metadata (extracting source names)
    # 3. Splitting documents into semantic chunks
    # 4. Generating embeddings using HuggingFace model
    # 5. Storing embeddings in ChromaDB
The @st.cache_resource decorator ensures this data loading happens only once per app session, significantly improving response times for subsequent queries.

What Happens Behind the Scenes

When you submit a query, here’s what DeenPAL does:
  1. Semantic Search: Your query is converted to an embedding and compared against the Hadith database
  2. MMR Retrieval: The system retrieves the top 4 diverse results from 10 candidates using Maximal Marginal Relevance
  3. Context Building: Retrieved Hadiths are formatted with their metadata
  4. LLM Generation: The DeepSeek model generates a response based on the retrieved context
  5. Response Display: The answer is shown with proper Hadith citations and explanations
From chains.py:15-18:
retriever = db.as_retriever(
    search_type="mmr",  # Use Maximal Marginal Relevance
    search_kwargs={"k": 4, "fetch_k": 10}  # Retrieve top 4 diverse results from 10 candidates
)

Troubleshooting

By default, Streamlit uses port 8501. If you need to use port 8080, run:
streamlit run app.py --server.port 8080
Ensure your .env file is in the root directory (same level as app.py) and contains:
DEEPSEEK_API_KEY="your-actual-api-key"
Make sure there are no extra spaces around the equals sign.
Verify that:
  • The data/ directory exists in the project root
  • Your PDF files are placed directly in the data/ directory
  • The PDF files are readable and not corrupted
This is expected behavior. The first run involves:
  • Loading all PDF documents
  • Downloading the HuggingFace embedding model (sentence-transformers/all-MiniLM-L6-v2)
  • Generating embeddings for all chunks
  • Initializing the ChromaDB vector store
Subsequent runs will be much faster due to caching.

Next Steps

Architecture

Learn about the technical architecture and how RAG works in DeenPAL.

Configuration

Customize the retrieval parameters, models, and prompts.

Build docs developers (and LLMs) love