Overview
The QA API is a FastAPI service that provides intelligent question-answering for a Tableau course using Retrieval-Augmented Generation (RAG). The service embeds course transcripts, retrieves relevant context, and generates answers with citation tracking and hallucination detection.Architecture
- Framework: FastAPI 1.1.0
- LLM Provider: OpenAI (configurable model)
- Vector Store: ChromaDB with OpenAI embeddings
- Orchestration: LangChain for RAG pipeline
- Document Processing: Markdown header splitting + token chunking
RAG Pipeline
- Document Loading: PDF transcripts loaded via
PyPDFLoader - Splitting: Markdown header splitter (section/lecture) + token splitter (350 tokens, 50 overlap)
- Embedding: OpenAI embeddings (
text-embedding-3-smalldefault) - Storage: ChromaDB collection (
tableau_qa_collection) - Retrieval: Top-k=4 most relevant chunks
- Generation: ChatOpenAI with zero temperature for deterministic answers
Request/Response Schemas
QARequest
QAResponse
API Endpoints
POST /qa
Synchronous question-answering endpoint. Request Example:POST /qa/stream
Streaming question-answering endpoint using Server-Sent Events (SSE). Request Example:GET /health
Response Schema:GET /monitoring
Returns aggregated QA performance metrics. Response Schema:Configuration
Environment variables:OPENAI_API_KEY: Required. OpenAI API key for embeddings and chatOPENAI_CHAT_MODEL: Chat model name (default:gpt-4o-mini)OPENAI_EMBEDDING_MODEL: Embedding model (default:text-embedding-3-small)QA_TRANSCRIPT_PDF: Path to course transcript PDF (default:tableau_course_transcript.pdf)QA_CHROMA_COLLECTION: ChromaDB collection name (default:tableau_qa_collection)
Document Processing
Transcript processing pipeline (src/qa_api.py:163-184):- Load PDF using
PyPDFLoader - Split by Markdown headers:
#→ section##→ lecture###→ topic
- Token-based chunking:
- Chunk size: 350 tokens
- Overlap: 50 tokens
- Embed chunks with OpenAI embeddings
- Store in ChromaDB
Prompt Engineering
System prompt (src/qa_api.py:78-88):Citation Validation
The service extracts citations using regex pattern matching (src/qa_api.py:104-106):retrieval_accuracy:
- 1.0: All citations match retrieved documents
- < 1.0: Some citations are hallucinated (triggers
hallucination_flag)
Confidence Scoring
Confidence is computed using a weighted formula (src/qa_api.py:126-131):- 40%: Coverage (number of retrieved documents)
- 40%: Retrieval accuracy (citation validity)
- 20%: Non-empty answer check
Fallback Behavior
IfOPENAI_API_KEY is not set or retrieval fails:
Starting the Service
Start the QA API:Monitoring Metrics
The service tracks in-memory metrics for all requests:requests_total: Total QA requests processedlatency_ms_total: Cumulative latency (sum)retrieval_accuracy_total: Cumulative retrieval accuracy (sum)hallucination_count: Number of requests with invalid citations
Error Handling
- 503 Service Unavailable: LLM not configured (missing API key)
- 422 Unprocessable Entity: Invalid request fields
- 500 Internal Server Error: Unexpected errors during retrieval or generation
Related
- Prediction API - Student purchase prediction service
- Monitoring - Drift detection and system health