Skip to main content
Welcome to the RAG Support System documentation. This system combines semantic retrieval over an indexed knowledge base with LLM-based generation to produce grounded, cited answers while mitigating hallucinations and adversarial inputs.

What is the RAG Support System?

The RAG Support System is an AI-powered customer support platform that:
  • Retrieves relevant documentation from a vector store (Chroma) using semantic search
  • Triages incoming tickets with ML models to predict category and priority
  • Generates grounded, cited answers using large language models
  • Evaluates answer quality with offline faithfulness and relevance metrics
  • Flags low-confidence cases for human review
Built with Python, FastAPI, LangChain, Chroma, and OpenAI, the system prioritizes correctness, modularity, and production readiness.

Core capabilities

Semantic retrieval

Vector-based search over your knowledge base using OpenAI embeddings and Chroma for fast, relevant results

ML-powered triage

Automatic classification of support tickets by category and priority with confidence scoring

Grounded answers

LLM-generated responses with citations and internal next steps, backed by retrieved context

Production safeguards

Prompt injection protection, adversarial testing, and human-in-the-loop workflows for uncertain cases

Key features

  • Document ingestion — Chunk and embed markdown files into Chroma with configurable chunking strategies
  • RAG agent — Retrieval-augmented generation pipeline with category-aware filtering and low-latency responses
  • Triage models — TF-IDF + Logistic Regression models for category and priority prediction
  • Structured outputs — JSON-formatted citations, internal next steps, and review flags
  • Offline evaluation — Relevance, faithfulness, and adversarial robustness testing with audit-ready reports
  • FastAPI endpoints — Production-ready HTTP API for ingestion, question answering, and triage

Get started

Quickstart

Go from zero to your first RAG query in under 5 minutes

Installation

Set up Python, dependencies, and environment variables

Architecture

Understand system components and request flow

API Reference

Explore endpoints, request models, and examples

Architecture overview

The system follows a modular architecture with clear separation of concerns:
Client Request

  FastAPI Layer (validation, routing)

  Triage Service (category + priority prediction)

  RAG Service (embed query → retrieve chunks → generate answer)

  Response (draft_reply, citations, internal_next_steps, needs_human_review)
See the Architecture page for detailed component descriptions and request flow diagrams.

Design principles

The RAG Support System is built on these core principles:
  1. Correctness first — Answers must be supported by retrieved knowledge; hallucinations are unacceptable
  2. Modularity — Retrieval, generation, and evaluation are independently testable
  3. Cost awareness — Predictable and controllable LLM usage with bounded retrieval and caching
  4. Security — Resilience against prompt injection and misuse with explicit refusal behavior
  5. Production readiness — Observable, scalable, and maintainable with structured logging and metrics
This system prioritizes faithfulness over creativity. Lower temperature and constrained prompts reduce expressive freedom but eliminate hallucinations in support contexts.

Technology stack

  • Python 3.12+ — Core language with type hints and async support
  • FastAPI — High-performance API framework with automatic OpenAPI docs
  • LangChain — LLM orchestration and document processing
  • Chroma — Vector database for semantic search
  • OpenAI — Embeddings (text-embedding-3-small) and LLM (GPT-4.1)
  • scikit-learn — ML models for triage classification
  • uv — Fast Python package installer and dependency manager

Next steps

1

Follow the quickstart

Install dependencies, ingest documents, and make your first RAG query in minutes
2

Read the architecture guide

Understand how the system components work together
3

Explore the API reference

Learn about available endpoints and request/response models
4

Run evaluations

Test answer quality with offline metrics and adversarial robustness checks

Build docs developers (and LLMs) love