What is the RAG Support System?
The RAG Support System is an AI-powered customer support platform that:- Retrieves relevant documentation from a vector store (Chroma) using semantic search
- Triages incoming tickets with ML models to predict category and priority
- Generates grounded, cited answers using large language models
- Evaluates answer quality with offline faithfulness and relevance metrics
- Flags low-confidence cases for human review
Core capabilities
Semantic retrieval
Vector-based search over your knowledge base using OpenAI embeddings and Chroma for fast, relevant results
ML-powered triage
Automatic classification of support tickets by category and priority with confidence scoring
Grounded answers
LLM-generated responses with citations and internal next steps, backed by retrieved context
Production safeguards
Prompt injection protection, adversarial testing, and human-in-the-loop workflows for uncertain cases
Key features
- Document ingestion — Chunk and embed markdown files into Chroma with configurable chunking strategies
- RAG agent — Retrieval-augmented generation pipeline with category-aware filtering and low-latency responses
- Triage models — TF-IDF + Logistic Regression models for category and priority prediction
- Structured outputs — JSON-formatted citations, internal next steps, and review flags
- Offline evaluation — Relevance, faithfulness, and adversarial robustness testing with audit-ready reports
- FastAPI endpoints — Production-ready HTTP API for ingestion, question answering, and triage
Get started
Quickstart
Go from zero to your first RAG query in under 5 minutes
Installation
Set up Python, dependencies, and environment variables
Architecture
Understand system components and request flow
API Reference
Explore endpoints, request models, and examples
Architecture overview
The system follows a modular architecture with clear separation of concerns:Design principles
The RAG Support System is built on these core principles:- Correctness first — Answers must be supported by retrieved knowledge; hallucinations are unacceptable
- Modularity — Retrieval, generation, and evaluation are independently testable
- Cost awareness — Predictable and controllable LLM usage with bounded retrieval and caching
- Security — Resilience against prompt injection and misuse with explicit refusal behavior
- Production readiness — Observable, scalable, and maintainable with structured logging and metrics
This system prioritizes faithfulness over creativity. Lower temperature and constrained prompts reduce expressive freedom but eliminate hallucinations in support contexts.
Technology stack
- Python 3.12+ — Core language with type hints and async support
- FastAPI — High-performance API framework with automatic OpenAPI docs
- LangChain — LLM orchestration and document processing
- Chroma — Vector database for semantic search
- OpenAI — Embeddings (text-embedding-3-small) and LLM (GPT-4.1)
- scikit-learn — ML models for triage classification
- uv — Fast Python package installer and dependency manager
Next steps
Follow the quickstart
Install dependencies, ingest documents, and make your first RAG query in minutes