Prerequisites
Before you begin, ensure you have:- Python 3.12+ installed
- uv package manager (recommended) or pip
- OpenAI API key for embeddings and LLM
- Unstructured API key for document parsing
If you don’t have uv installed, get it with:
curl -LsSf https://astral.sh/uv/install.sh | shStep 1: Clone and install
Step 2: Configure environment variables
Create a.env file in the project root with your API keys:
.env
Step 3: Ingest knowledge base documents
Before the RAG system can answer questions, you need to ingest documentation into the vector store.Document ingestion chunks your markdown files using Unstructured API, generates embeddings with OpenAI’s
text-embedding-3-small, and stores them in Chroma at ./chroma_db.Step 4: Start the API server
Launch the FastAPI development server:http://localhost:8000 with these endpoints:
GET /api/v1/health— Health checkPOST /api/v1/ingest— Ingest documentsPOST /api/v1/answer— Submit questionsPOST /api/v1/triage— Run triage models
http://localhost:8000/api/docs for interactive API documentation.
Step 5: Make your first RAG query
Now that documents are ingested and the API is running, submit a test question:Expected response
The response includes:
- draft_reply — Customer-facing answer
- internal_next_steps — Actions for support agents
- citations — Source documents with snippets
- needs_human_review — Flag for low-confidence predictions
- predicted_category and predicted_priority — Triage model outputs
Understanding the response
Let’s break down what just happened:Triage prediction
The system ran ML models to predict:
- Category:
Billing & Payments(confidence: 0.92) - Priority:
P1(confidence: 0.87)
Semantic retrieval
The RAG agent:
- Embedded your question with
text-embedding-3-small - Searched Chroma for top 5 similar chunks filtered by predicted category
- Retrieved relevant context from
billing.md
Answer generation
The LLM (GPT-4.1) generated a grounded answer using:
- Retrieved context from the knowledge base
- Category-specific system prompt
- Priority-aware tone adjustments
What’s happening under the hood?
Here’s the code that powers your query, fromsrc/rag/retriever.py:201-260:
retriever.py
Next steps
Now that you’ve made your first query, explore more:Train triage models
Train custom ML models on your support tickets:
Run evaluations
Test answer quality with offline metrics:
Explore the architecture
Learn how system components work together
Read API docs
Dive into endpoint specifications and request models
Troubleshooting
Import errors or missing modules
Import errors or missing modules
Ensure you’ve activated your virtual environment and run
uv sync:OpenAI authentication errors
OpenAI authentication errors
Verify your
.env file contains a valid OPENAI_API_KEY and is in the project root directory.No chunks retrieved / empty results
No chunks retrieved / empty results
Make sure you’ve ingested documents before querying:Check that
./chroma_db directory exists and contains data.Triage model errors
Triage model errors
Train triage models before making queries:Trained models are saved to
artifacts/ and required for the /answer endpoint.