Quickstart

This guide will help you set up the RAG Support System and make your first query. You’ll install dependencies, ingest knowledge base documents, start the API server, and submit a test question.

Prerequisites

Before you begin, ensure you have:

Python 3.12+ installed
uv package manager (recommended) or pip
OpenAI API key for embeddings and LLM
Unstructured API key for document parsing

If you don’t have uv installed, get it with: curl -LsSf https://astral.sh/uv/install.sh | sh

Step 1: Clone and install

Clone the repository

git clone https://github.com/JoAmps/rgt-assignment.git
cd rgt-assignment

Create a virtual environment

python -m venv .venv
source .venv/bin/activate  # macOS/Linux
# .venv\Scripts\activate  # Windows

Install dependencies

uv sync

This installs all required packages from pyproject.toml including FastAPI, LangChain, Chroma, and OpenAI.

Step 2: Configure environment variables

Create a .env file in the project root with your API keys:

.env

OPENAI_API_KEY=your_openai_api_key
UNSTRUCTURED_API_KEY=your_unstructured_api_key

Keep your .env file out of version control. Never commit API keys to your repository.

Step 3: Ingest knowledge base documents

Before the RAG system can answer questions, you need to ingest documentation into the vector store.

# Ingest all .md files from kb_docs/ folder
uv run -m src.rag.ingest

Document ingestion chunks your markdown files using Unstructured API, generates embeddings with OpenAI’s text-embedding-3-small, and stores them in Chroma at ./chroma_db.

You should see output like:

Processing: kb_docs/billing.md
Chunked into 12 segments
Stored in Chroma collection: docs_collection

Step 4: Start the API server

Launch the FastAPI development server:

uv run main.py

The server starts at http://localhost:8000 with these endpoints:

GET /api/v1/health — Health check
POST /api/v1/ingest — Ingest documents
POST /api/v1/answer — Submit questions
POST /api/v1/triage — Run triage models

Visit http://localhost:8000/api/docs for interactive API documentation.

Step 5: Make your first RAG query

Now that documents are ingested and the API is running, submit a test question:

curl -X POST "http://localhost:8000/api/v1/answer" \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "Refund issue",
    "body": "I was charged twice for my subscription",
    "user_question": "How long does a refund take?"
  }'

Expected response

{
  "draft_reply": "Refunds are typically processed within 5-7 business days. Once approved, the amount will be credited back to your original payment method. You'll receive an email confirmation when the refund is complete.",
  "internal_next_steps": [
    "Verify duplicate charge in billing system",
    "Initiate refund for duplicate transaction",
    "Follow up with customer in 7 days"
  ],
  "citations": [
    {
      "document_name": "billing.md",
      "chunk_id": "element-12",
      "snippet": "Refunds are typically processed...",
      "full_content": "Refunds are typically processed within 5-7 business days..."
    }
  ],
  "needs_human_review": false,
  "predicted_category": "Billing & Payments",
  "predicted_priority": "P1",
  "confidence": {
    "category": 0.92,
    "priority": 0.87
  }
}

The response includes:

draft_reply — Customer-facing answer
internal_next_steps — Actions for support agents
citations — Source documents with snippets
needs_human_review — Flag for low-confidence predictions
predicted_category and predicted_priority — Triage model outputs

Understanding the response

Let’s break down what just happened:

Triage prediction

The system ran ML models to predict:

Category: Billing & Payments (confidence: 0.92)
Priority: P1 (confidence: 0.87)

High confidence scores mean the prediction is reliable.

Semantic retrieval

The RAG agent:

Embedded your question with text-embedding-3-small
Searched Chroma for top 5 similar chunks filtered by predicted category
Retrieved relevant context from billing.md

Answer generation

The LLM (GPT-4.1) generated a grounded answer using:

Retrieved context from the knowledge base
Category-specific system prompt
Priority-aware tone adjustments

The answer is constrained to ONLY use information from retrieved documents.

Structured outputs

The system generated:

Citations with source documents and snippets
Internal next steps for support agents
Review flag based on confidence thresholds (0.5 for category and priority)

What’s happening under the hood?

Here’s the code that powers your query, from src/rag/retriever.py:201-260:

retriever.py

def answer(
    self,
    query: str,
    predicted_category: str,
    priority: str,
    confidence: Dict[str, float],
    k: int = 5,
) -> Dict:
    """
    End-to-end RAG pipeline.
    """
    # Retrieve top-k chunks filtered by category
    chunks = self.retrieve(
        query,
        predicted_category=predicted_category,
        k=k,
    )

    if not chunks:
        return {
            "draft_reply": "Insufficient context. Please clarify your request.",
            "internal_next_steps": [],
            "citations": [],
            "needs_human_review": True,
        }

    # Assemble context from retrieved chunks
    context_text = "\n\n".join(c["content"] for c in chunks)

    # Generate grounded answer
    answer = self.generate_answer(
        context=context_text,
        query=query,
        predicted_category=predicted_category,
        priority=priority,
    )

    # Generate internal next steps
    internal_next_steps = generate_internal_next_steps(
        context=context_text,
        query=query,
    )

    # Flag for human review if confidence is low
    needs_human_review = (
        confidence.get("category", 0) < CATEGORY_CONF_THRESHOLD
        or confidence.get("priority", 0) < PRIORITY_CONF_THRESHOLD
    )

    return self.format_response(
        answer=answer,
        internal_next_steps=internal_next_steps,
        chunks=chunks,
        needs_human_review=needs_human_review,
    )

Next steps

Now that you’ve made your first query, explore more:

Train triage models

Train custom ML models on your support tickets:

uv run -m src.ml.train

Run evaluations

Test answer quality with offline metrics:

python -m src.rag.evals

Explore the architecture

Learn how system components work together

Read API docs

Dive into endpoint specifications and request models

Troubleshooting

Import errors or missing modules

Ensure you’ve activated your virtual environment and run uv sync:

source .venv/bin/activate
uv sync

OpenAI authentication errors

Verify your .env file contains a valid OPENAI_API_KEY and is in the project root directory.

No chunks retrieved / empty results

Make sure you’ve ingested documents before querying:

uv run -m src.rag.ingest

Check that ./chroma_db directory exists and contains data.

Triage model errors

Train triage models before making queries:

uv run -m src.ml.train

Trained models are saved to artifacts/ and required for the /answer endpoint.

Getting Started

Core Concepts

Guides

Deployment

Prerequisites

Step 1: Clone and install

Step 2: Configure environment variables

Step 3: Ingest knowledge base documents

Step 4: Start the API server

Step 5: Make your first RAG query

Expected response

Understanding the response

What’s happening under the hood?

Next steps

Train triage models

Run evaluations

Explore the architecture

Read API docs

Troubleshooting

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Deployment

​Prerequisites

​Step 1: Clone and install

​Step 2: Configure environment variables

​Step 3: Ingest knowledge base documents

​Step 4: Start the API server

​Step 5: Make your first RAG query

​Expected response

​Understanding the response

​What’s happening under the hood?

​Next steps

Train triage models

Run evaluations

Explore the architecture

Read API docs

​Troubleshooting

Build docs developers (and LLMs) love

Prerequisites

Step 1: Clone and install

Step 2: Configure environment variables

Step 3: Ingest knowledge base documents

Step 4: Start the API server

Step 5: Make your first RAG query

Expected response

Understanding the response

What’s happening under the hood?

Next steps

Troubleshooting