System Architecture
LangShazam is a real-time spoken language detection service built with a modern, scalable architecture that processes audio streams and identifies the spoken language using OpenAI’s Whisper model.Architecture Diagram
Component Interaction Flow
Audio Capture
The React frontend captures audio from the user’s microphone using the Web Audio API and MediaRecorder API
Audio Processing
The backend’s AudioProcessor receives audio chunks and sends them to OpenAI’s Whisper API
Technology Stack
Backend
FastAPI
High-performance async web framework for PythonVersion: 0.109.0
Uvicorn
Lightning-fast ASGI serverVersion: 0.27.0
OpenAI SDK
Integration with Whisper model for transcriptionVersion: 1.66.3
WebSockets
Real-time bidirectional communicationVersion: 12.0
Frontend
React
Component-based UI frameworkVersion: 18.2.0
React Router
Client-side routing for SPA navigationVersion: 6.22.3
Web Audio API
Native browser audio processing and visualization
MediaRecorder API
Audio capture and encoding in the browser
Additional Dependencies
- psutil (5.9.8) - System and process monitoring for metrics
- python-multipart (0.0.9) - Multipart form data parsing
- country-flag-icons (1.5.18) - Language flag icons for UI
Communication Patterns
WebSocket Protocol
LangShazam uses WebSocket for real-time, bidirectional communication between the frontend and backend.
- Client establishes WebSocket connection to
/wsendpoint - Client streams audio data as binary chunks
- Server processes audio when minimum threshold is reached (20KB)
- Server responds with JSON containing detection results
- Connection closes after result delivery
REST API Endpoints
GET /
GET /
Purpose: Health check endpointResponse:
GET /metrics
GET /metrics
Purpose: Retrieve server performance metricsResponse:
Deployment Architecture
Infrastructure
Kubernetes on AWS
The application is deployed on AWS using Kubernetes for container orchestration.WebSocket Endpoint:
wss://3.149.10.154.nip.io/wsConfiguration
Backend Configuration (backend/src/config/settings.py:6-11):
CORS Configuration
Allowed origins include:- Production:
langshazam.com(HTTP/HTTPS) - Development:
localhost:3000,localhost:5173,127.0.0.1:3000,127.0.0.1:5173
backend/src/config/settings.py:14-23
Key Features
Connection Tracing
Each WebSocket connection receives a unique 8-character ID for request tracking and debugging
Rate Limiting
API calls are limited to 3 concurrent requests using asyncio semaphores
Real-time Metrics
Continuous monitoring of connections, processing times, CPU usage, and memory
Adaptive Buffering
Collects minimum 20KB of audio data before processing for better accuracy
Audio Processing Parameters
Frombackend/src/config/settings.py:26-32:
| Parameter | Value | Description |
|---|---|---|
| Min Audio Size | 20,000 bytes | Minimum buffer before processing |
| Chunk Size | 128 KB | Audio data chunk size |
| Min Audio Length | 4 seconds | Minimum recording duration |
| Max Audio Length | 15 seconds | Maximum recording duration |
| Bits Per Second | 16,000 | Audio encoding bitrate |
Next Steps
Backend Architecture
Dive deep into the FastAPI backend implementation
Frontend Architecture
Explore the React frontend structure and patterns

