Installation

This guide provides detailed instructions for installing and configuring RAG Chat in your local environment.

System Requirements

Python

Python 3.8 or higher required

Memory

Minimum 4GB RAM recommended

Storage

At least 500MB for dependencies and vector database

API Key

Valid OpenAI API key with access to GPT models

Installation Steps

Clone the repository

git clone <your-repository-url>
cd rag-chat

Create a virtual environment

Using Python’s built-in venv module (recommended):

python -m venv venv
source venv/bin/activate

Using a virtual environment isolates project dependencies and prevents conflicts with other Python projects.

Install dependencies

With your virtual environment activated, install all required packages:

pip install -r requirements.txt

This installs the following key dependencies:

Core Framework

streamlit==1.49.1 - Web application framework
python-dotenv==1.1.1 - Environment variable management

LangChain Stack

langchain==0.3.27 - RAG orchestration framework
langchain-core==0.3.75 - Core abstractions
langchain-openai==0.3.32 - OpenAI integration
langchain-chroma==0.2.5 - ChromaDB vector store
langchain-community==0.3.29 - Community integrations
langchain-text-splitters==0.3.11 - Document chunking

AI/ML Services

openai==1.106.1 - OpenAI API client
chromadb==1.0.20 - Vector database
tiktoken==0.11.0 - Token counting for OpenAI models

Document Processing

pypdf - PDF text extraction (via langchain-community dependencies)

Installation may take 2-5 minutes depending on your connection speed. The total size is approximately 200-300MB.

Configure environment variables

Create your environment configuration:

cp .env.example .env

Edit the .env file with your OpenAI API key:

.env

OPENAI_API_KEY='sk-proj-xxxxxxxxxxxxxxxxxx'

Security Best Practices:

Never commit .env files to version control
Don’t share your API key publicly
Rotate keys regularly if they may have been exposed
Use separate keys for development and production

The application loads these variables at startup:

app.py:10-13

from dotenv import load_dotenv

load_dotenv()

Verify installation

Check that Streamlit is properly installed:

streamlit --version

You should see output like:

Streamlit, version 1.49.1

Running the Application

Start the Streamlit server:

streamlit run app.py

You should see output similar to:

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.1.x:8501

The application will automatically open in your default browser. If it doesn’t, navigate to http://localhost:8501.

Configuration Options

The app is configured with the following settings:

app.py:88-91

st.set_page_config(
    page_title='RAG Chat',
    page_icon='🤖'
)

You can customize these values in app.py to change the browser tab title and icon.

Vector Database Setup

RAG Chat uses ChromaDB for persistent vector storage. On first run:

A db/ directory is automatically created in your project root
When you upload documents, embeddings are stored in this directory
The vector store persists across application restarts

The initialization code:

app.py:16-23

def load_existing_vector_store():
    if os.path.exists(persistant_directory):
        vector_store = Chroma(
            persist_directory=persistant_directory,
            embedding_function=OpenAIEmbeddings()
        )
        return vector_store
    return None

If you want to start fresh, simply delete the db/ directory. It will be recreated when you upload new documents.

Document Processing Configuration

The application processes PDFs with these default settings:

app.py:33-36

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 400
)

Parameters:

chunk_size=1000 - Each document chunk contains up to 1000 characters
chunk_overlap=400 - Consecutive chunks overlap by 400 characters to preserve context

These values balance retrieval precision with context window efficiency. You can adjust them in app.py based on your document types.

Troubleshooting

Import Errors

If you see ModuleNotFoundError, ensure your virtual environment is activated:

# Check if venv is active (should show path to venv/bin/python)
which python

# If not active, activate it
source venv/bin/activate  # macOS/Linux
venv\Scripts\activate     # Windows

API Key Issues

If you get authentication errors:

Verify your API key is correctly set in .env
Check that .env is in the project root directory
Restart the application after editing .env
Verify your OpenAI account has available credits

ChromaDB Errors

If you encounter database corruption:

# Remove the database directory
rm -rf db/

# Restart the application
streamlit run app.py

Port Already in Use

If port 8501 is already in use:

streamlit run app.py --server.port 8502

Upgrading Dependencies

To update to the latest compatible versions:

pip install --upgrade -r requirements.txt

Major version upgrades of LangChain or OpenAI packages may introduce breaking changes. Test thoroughly after upgrading.

Development Setup

For development work, you may want additional tools:

pip install pytest black flake8 mypy

Next Steps

Quickstart

Start asking questions about your documents

Configuration

Customize chunking, models, and embeddings

RAG Overview

Learn how the RAG pipeline works

API Reference

Detailed function documentation

Get Started

Core Concepts

Guides

Reference

Advanced

System Requirements

Python

Memory

Storage

API Key

Installation Steps

Core Framework

LangChain Stack

AI/ML Services

Document Processing

Running the Application

Configuration Options

Vector Database Setup

Document Processing Configuration

Troubleshooting

Import Errors

API Key Issues

ChromaDB Errors

Port Already in Use

Upgrading Dependencies

Development Setup

Next Steps

Quickstart

Configuration

RAG Overview

API Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Reference

Advanced

​System Requirements

Python

Memory

Storage

API Key

​Installation Steps

​Core Framework

​LangChain Stack

​AI/ML Services

​Document Processing

​Running the Application

​Configuration Options

​Vector Database Setup

​Document Processing Configuration

​Troubleshooting

​Import Errors

​API Key Issues

​ChromaDB Errors

​Port Already in Use

​Upgrading Dependencies

​Development Setup

​Next Steps

Quickstart

Configuration

RAG Overview

API Reference

Build docs developers (and LLMs) love

System Requirements

Installation Steps

Core Framework

LangChain Stack

AI/ML Services

Document Processing

Running the Application

Configuration Options

Vector Database Setup

Document Processing Configuration

Troubleshooting

Import Errors

API Key Issues

ChromaDB Errors

Port Already in Use

Upgrading Dependencies

Development Setup

Next Steps