Skip to main content

System Architecture

LangShazam is a real-time spoken language detection service built with a modern, scalable architecture that processes audio streams and identifies the spoken language using OpenAI’s Whisper model.

Architecture Diagram

Component Interaction Flow

1

Audio Capture

The React frontend captures audio from the user’s microphone using the Web Audio API and MediaRecorder API
2

WebSocket Connection

Audio data is streamed to the backend via WebSocket connection at /ws endpoint
3

Audio Processing

The backend’s AudioProcessor receives audio chunks and sends them to OpenAI’s Whisper API
4

Language Detection

OpenAI’s Whisper model analyzes the audio and returns the detected language
5

Result Delivery

The detection result is sent back to the client through the WebSocket connection

Technology Stack

Backend

FastAPI

High-performance async web framework for PythonVersion: 0.109.0

Uvicorn

Lightning-fast ASGI serverVersion: 0.27.0

OpenAI SDK

Integration with Whisper model for transcriptionVersion: 1.66.3

WebSockets

Real-time bidirectional communicationVersion: 12.0

Frontend

React

Component-based UI frameworkVersion: 18.2.0

React Router

Client-side routing for SPA navigationVersion: 6.22.3

Web Audio API

Native browser audio processing and visualization

MediaRecorder API

Audio capture and encoding in the browser

Additional Dependencies

  • psutil (5.9.8) - System and process monitoring for metrics
  • python-multipart (0.0.9) - Multipart form data parsing
  • country-flag-icons (1.5.18) - Language flag icons for UI

Communication Patterns

WebSocket Protocol

LangShazam uses WebSocket for real-time, bidirectional communication between the frontend and backend.
Connection Flow:
  1. Client establishes WebSocket connection to /ws endpoint
  2. Client streams audio data as binary chunks
  3. Server processes audio when minimum threshold is reached (20KB)
  4. Server responds with JSON containing detection results
  5. Connection closes after result delivery
Message Format (Server → Client):
{
  "status": "success",
  "data": {
    "language": "english",
    "confidence": 0.9,
    "processing_time": 1.23,
    "connection_id": "a1b2c3d4"
  },
  "timestamp": "2026-03-08T12:34:56.789Z",
  "connection_id": "a1b2c3d4"
}

REST API Endpoints

Purpose: Health check endpointResponse:
{
  "message": "Server is running!"
}
Purpose: Retrieve server performance metricsResponse:
{
  "active_connections": 2,
  "total_requests": 150,
  "errors": 0,
  "avg_processing_time": 1.45,
  "memory_usage_mb": 256.5,
  "cpu_total_percent": 15.2,
  "cpu_per_core": [12.3, 18.1, 14.5, 16.0],
  "total_cpu_cores": 4,
  "effective_cores_used": 0.608
}

Deployment Architecture

Infrastructure

Kubernetes on AWS

The application is deployed on AWS using Kubernetes for container orchestration.WebSocket Endpoint: wss://3.149.10.154.nip.io/ws

Configuration

Backend Configuration (backend/src/config/settings.py:6-11):
SERVER_CONFIG = {
    "host": "0.0.0.0",
    "port": int(os.getenv("PORT", "10000")),
    "debug": os.getenv("DEBUG", "false").lower() == "true"
}

CORS Configuration

The backend is configured to accept requests from specific origins for security.
Allowed origins include:
  • Production: langshazam.com (HTTP/HTTPS)
  • Development: localhost:3000, localhost:5173, 127.0.0.1:3000, 127.0.0.1:5173
Configuration reference: backend/src/config/settings.py:14-23

Key Features

Connection Tracing

Each WebSocket connection receives a unique 8-character ID for request tracking and debugging

Rate Limiting

API calls are limited to 3 concurrent requests using asyncio semaphores

Real-time Metrics

Continuous monitoring of connections, processing times, CPU usage, and memory

Adaptive Buffering

Collects minimum 20KB of audio data before processing for better accuracy

Audio Processing Parameters

From backend/src/config/settings.py:26-32:
ParameterValueDescription
Min Audio Size20,000 bytesMinimum buffer before processing
Chunk Size128 KBAudio data chunk size
Min Audio Length4 secondsMinimum recording duration
Max Audio Length15 secondsMaximum recording duration
Bits Per Second16,000Audio encoding bitrate

Next Steps

Backend Architecture

Dive deep into the FastAPI backend implementation

Frontend Architecture

Explore the React frontend structure and patterns

Build docs developers (and LLMs) love