System Architecture
The Interview Preparation Platform is a hybrid AI application built as a Client-Server architecture that combines static document analysis with real-time audio processing.High-Level Architecture Diagram
Parallel Processing Architecture
The platform’s most innovative feature is its dual-stream parallel processing system that analyzes both how you speak (signal) and what you say (semantic).Audio Processing Fork
When audio data arrives from the frontend, the backend splits it into two independent streams:Stream A: Signal Stream (Local Processing)
- Latency: < 5ms per chunk
- Processing: Synchronous, local
- Output: Numeric metrics (pitch, volume, pauses)
- Location:
interview_analyzer.py:analyze_audio_chunk_fast()
Stream B: Semantic Stream (External Processing)
- Latency: 200-500ms per chunk
- Processing: Asynchronous, external APIs
- Output: Transcribed text, semantic analysis, AI feedback
- Location:
assemblyai_websocket_stream.py,rag.py
Data Flow Diagram
AI Models & Technologies
1. Google Gemini (LLM)
Role: The Reasoning Engine Implementation:- Location:
rag.py,app.py - API: Google Generative AI SDK
- Generates context-aware interview questions
- Provides qualitative feedback on answers
- Synthesizes signal + semantic data into human-readable reports
- Powers the RAG (Retrieval-Augmented Generation) pipeline
2. Sentence Transformers (all-MiniLM-L6-v2)
Role: The Matchmaker Implementation:- Location:
interview_analyzer.py,resume_processor.py - Model:
sentence-transformers/all-MiniLM-L6-v2
- Converts text (resume, job descriptions, answers) into 384-dimensional vectors
- Enables semantic similarity calculations
- Powers the matching between user answers and ideal responses
3. Faster-Whisper (Local ASR)
Role: Local Speech-to-Text Engine Implementation:- Location:
interview_analyzer.py(WhisperModelManager) - Model: OpenAI Whisper (optimized)
- Provides offline transcription capabilities
- Fallback when AssemblyAI is unavailable
- No external API dependency
4. AssemblyAI
Role: Real-time Streaming Transcription Implementation:- Location:
assemblyai_websocket_stream.py - Protocol: WebSocket streaming
- Low-latency real-time transcription (200-500ms)
- Streams text back to frontend for live captions
- Production-grade accuracy
Signal Processing (Physics Layer)
Unlike standard chatbots, this platform implements research-grade signal processing to analyze vocal characteristics.Implemented Algorithms
1. YIN Pitch Detection Algorithm
Purpose: Track fundamental frequency (F0) of voice Implementation:- Pitch Stability (coefficient of variation)
- Pitch Range (max - min)
- Confidence indicator
2. Welford’s Algorithm (Running Statistics)
Purpose: Calculate mean/variance on streaming data without storing all samples Implementation:- O(1) memory usage
- Real-time statistics on unlimited audio streams
- No disk I/O required
3. Voice Quality Metrics
Shimmer (Amplitude Perturbation):- Calculated from pitch variations
- Indicates voice steadiness
- Correlates with confidence
Technology Stack
Backend Stack
| Technology | Version | Purpose |
|---|---|---|
| Python | 3.9+ | Core language |
| Flask | 2.x | Web framework |
| Flask-SocketIO | 5.x | WebSocket server |
| SQLAlchemy | 1.4+ | ORM |
| SQLite | 3.x | Database |
| NumPy | 1.24+ | Numerical computing |
| Librosa | 0.10+ | Audio analysis |
| FAISS | 1.7+ | Vector similarity search |
| Sentence-Transformers | 2.2+ | Text embeddings |
| Faster-Whisper | 0.9+ | Local ASR |
| AssemblyAI SDK | 0.17+ | Streaming ASR |
| Google Generative AI | 0.3+ | LLM integration |
Frontend Stack
| Technology | Version | Purpose |
|---|---|---|
| React | 19.2.0 | UI framework |
| React Router | 7.9.4 | Client-side routing |
| Socket.IO Client | 4.8.3 | WebSocket client |
| Axios | 1.12.2 | HTTP client |
| Three.js | 0.183.1 | 3D graphics (avatars) |
| @react-three/fiber | 9.5.0 | React Three.js renderer |
| Lucide React | 0.575.0 | Icon library |
| React Markdown | 10.1.0 | Markdown rendering |
Audio Capture Technology
MediaRecorder API:- Sample Rate: 16kHz
- Bit Depth: 16-bit
- Channels: Mono
- Chunk Size: 4096 bytes
- Frequency: ~100ms intervals
Database Architecture
Database: SQLite (with potential PostgreSQL migration path) Key Tables:user- User accounts and profilesinterview_session- Interview history and resultsuser_mastery- Topic-level skill trackingsubtopic_mastery- Concept-level mastery dataquestion_history- Question-answer pairsstudy_action_plan- Personalized study recommendations
- Database:
/instance/interview_prep.db - Uploads (resumes):
/uploads/ - Processed data:
/data/processed/
Deployment Architecture
Development:- CORS configured for cross-origin requests
- WebSocket requires sticky sessions
- Database migrations via SQLAlchemy
- Environment variables for API keys
- Audio processing requires sufficient CPU
- FAISS indices stored on disk
Performance Characteristics
Latency Targets:- Signal Processing: < 5ms per chunk
- Transcription (AssemblyAI): 200-500ms
- Vector Search (FAISS): < 10ms
- LLM Response (Gemini): 1-3 seconds
- Single-server supports ~10 concurrent interviews
- Database handles 1000+ user profiles
- FAISS indices scale to 100K+ documents
- WebSocket connections pooled per user
Next Steps
Backend Structure
Deep dive into Flask modules and API routes
Frontend Structure
React components and state management