RAG Support System

Welcome to the RAG Support System documentation. This system combines semantic retrieval over an indexed knowledge base with LLM-based generation to produce grounded, cited answers while mitigating hallucinations and adversarial inputs.

What is the RAG Support System?

The RAG Support System is an AI-powered customer support platform that:

Retrieves relevant documentation from a vector store (Chroma) using semantic search
Triages incoming tickets with ML models to predict category and priority
Generates grounded, cited answers using large language models
Evaluates answer quality with offline faithfulness and relevance metrics
Flags low-confidence cases for human review

Built with Python, FastAPI, LangChain, Chroma, and OpenAI, the system prioritizes correctness, modularity, and production readiness.

Core capabilities

Semantic retrieval

Vector-based search over your knowledge base using OpenAI embeddings and Chroma for fast, relevant results

ML-powered triage

Automatic classification of support tickets by category and priority with confidence scoring

Grounded answers

LLM-generated responses with citations and internal next steps, backed by retrieved context

Production safeguards

Prompt injection protection, adversarial testing, and human-in-the-loop workflows for uncertain cases

Key features

Document ingestion — Chunk and embed markdown files into Chroma with configurable chunking strategies
RAG agent — Retrieval-augmented generation pipeline with category-aware filtering and low-latency responses
Triage models — TF-IDF + Logistic Regression models for category and priority prediction
Structured outputs — JSON-formatted citations, internal next steps, and review flags
Offline evaluation — Relevance, faithfulness, and adversarial robustness testing with audit-ready reports
FastAPI endpoints — Production-ready HTTP API for ingestion, question answering, and triage

Get started

Quickstart

Go from zero to your first RAG query in under 5 minutes

Installation

Set up Python, dependencies, and environment variables

Architecture

Understand system components and request flow

API Reference

Explore endpoints, request models, and examples

Architecture overview

The system follows a modular architecture with clear separation of concerns:

Client Request
      ↓
  FastAPI Layer (validation, routing)
      ↓
  Triage Service (category + priority prediction)
      ↓
  RAG Service (embed query → retrieve chunks → generate answer)
      ↓
  Response (draft_reply, citations, internal_next_steps, needs_human_review)

See the Architecture page for detailed component descriptions and request flow diagrams.

Design principles

The RAG Support System is built on these core principles:

Correctness first — Answers must be supported by retrieved knowledge; hallucinations are unacceptable
Modularity — Retrieval, generation, and evaluation are independently testable
Cost awareness — Predictable and controllable LLM usage with bounded retrieval and caching
Security — Resilience against prompt injection and misuse with explicit refusal behavior
Production readiness — Observable, scalable, and maintainable with structured logging and metrics

This system prioritizes faithfulness over creativity. Lower temperature and constrained prompts reduce expressive freedom but eliminate hallucinations in support contexts.

Technology stack

Python 3.12+ — Core language with type hints and async support
FastAPI — High-performance API framework with automatic OpenAPI docs
LangChain — LLM orchestration and document processing
Chroma — Vector database for semantic search
OpenAI — Embeddings (text-embedding-3-small) and LLM (GPT-4.1)
scikit-learn — ML models for triage classification
uv — Fast Python package installer and dependency manager

Next steps

Follow the quickstart

Install dependencies, ingest documents, and make your first RAG query in minutes

Read the architecture guide

Understand how the system components work together

Explore the API reference

Learn about available endpoints and request/response models

Run evaluations

Test answer quality with offline metrics and adversarial robustness checks

Getting Started

Core Concepts

Guides

Deployment

What is the RAG Support System?

Core capabilities

Semantic retrieval

ML-powered triage

Grounded answers

Production safeguards

Key features

Get started

Quickstart

Installation

Architecture

API Reference

Architecture overview

Design principles

Technology stack

Next steps

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Deployment

​What is the RAG Support System?

​Core capabilities

Semantic retrieval

ML-powered triage

Grounded answers

Production safeguards

​Key features

​Get started

Quickstart

Installation

Architecture

API Reference

​Architecture overview

​Design principles

​Technology stack

​Next steps

Build docs developers (and LLMs) love

What is the RAG Support System?

Core capabilities

Key features

Get started

Architecture overview

Design principles

Technology stack

Next steps