Architecture Overview

System Architecture

LangShazam is a real-time spoken language detection service built with a modern, scalable architecture that processes audio streams and identifies the spoken language using OpenAI’s Whisper model.

Architecture Diagram

Component Interaction Flow

Audio Capture

The React frontend captures audio from the user’s microphone using the Web Audio API and MediaRecorder API

WebSocket Connection

Audio data is streamed to the backend via WebSocket connection at /ws endpoint

Audio Processing

The backend’s AudioProcessor receives audio chunks and sends them to OpenAI’s Whisper API

Language Detection

OpenAI’s Whisper model analyzes the audio and returns the detected language

Result Delivery

The detection result is sent back to the client through the WebSocket connection

Technology Stack

Backend

FastAPI

High-performance async web framework for PythonVersion: 0.109.0

Uvicorn

Lightning-fast ASGI serverVersion: 0.27.0

OpenAI SDK

Integration with Whisper model for transcriptionVersion: 1.66.3

WebSockets

Real-time bidirectional communicationVersion: 12.0

Frontend

React

Component-based UI frameworkVersion: 18.2.0

React Router

Client-side routing for SPA navigationVersion: 6.22.3

Web Audio API

Native browser audio processing and visualization

MediaRecorder API

Audio capture and encoding in the browser

Additional Dependencies

psutil (5.9.8) - System and process monitoring for metrics
python-multipart (0.0.9) - Multipart form data parsing
country-flag-icons (1.5.18) - Language flag icons for UI

Communication Patterns

WebSocket Protocol

LangShazam uses WebSocket for real-time, bidirectional communication between the frontend and backend.

Connection Flow:

Client establishes WebSocket connection to /ws endpoint
Client streams audio data as binary chunks
Server processes audio when minimum threshold is reached (20KB)
Server responds with JSON containing detection results
Connection closes after result delivery

Message Format (Server → Client):

{
  "status": "success",
  "data": {
    "language": "english",
    "confidence": 0.9,
    "processing_time": 1.23,
    "connection_id": "a1b2c3d4"
  },
  "timestamp": "2026-03-08T12:34:56.789Z",
  "connection_id": "a1b2c3d4"
}

REST API Endpoints

GET /

Purpose: Health check endpointResponse:

{
  "message": "Server is running!"
}

GET /metrics

Purpose: Retrieve server performance metricsResponse:

{
  "active_connections": 2,
  "total_requests": 150,
  "errors": 0,
  "avg_processing_time": 1.45,
  "memory_usage_mb": 256.5,
  "cpu_total_percent": 15.2,
  "cpu_per_core": [12.3, 18.1, 14.5, 16.0],
  "total_cpu_cores": 4,
  "effective_cores_used": 0.608
}

Deployment Architecture

Infrastructure

Kubernetes on AWS

The application is deployed on AWS using Kubernetes for container orchestration.WebSocket Endpoint: wss://3.149.10.154.nip.io/ws

Configuration

Backend Configuration (backend/src/config/settings.py:6-11):

SERVER_CONFIG = {
    "host": "0.0.0.0",
    "port": int(os.getenv("PORT", "10000")),
    "debug": os.getenv("DEBUG", "false").lower() == "true"
}

CORS Configuration

The backend is configured to accept requests from specific origins for security.

Allowed origins include:

Production: langshazam.com (HTTP/HTTPS)
Development: localhost:3000, localhost:5173, 127.0.0.1:3000, 127.0.0.1:5173

Configuration reference: backend/src/config/settings.py:14-23

Key Features

Connection Tracing

Each WebSocket connection receives a unique 8-character ID for request tracking and debugging

Rate Limiting

API calls are limited to 3 concurrent requests using asyncio semaphores

Real-time Metrics

Continuous monitoring of connections, processing times, CPU usage, and memory

Adaptive Buffering

Collects minimum 20KB of audio data before processing for better accuracy

Audio Processing Parameters

From backend/src/config/settings.py:26-32:

Parameter	Value	Description
Min Audio Size	20,000 bytes	Minimum buffer before processing
Chunk Size	128 KB	Audio data chunk size
Min Audio Length	4 seconds	Minimum recording duration
Max Audio Length	15 seconds	Maximum recording duration
Bits Per Second	16,000	Audio encoding bitrate

Get Started

Core Features

Architecture

Architecture Overview

System Architecture

Architecture Diagram

Component Interaction Flow

Technology Stack

Backend

FastAPI

Uvicorn

OpenAI SDK

WebSockets

Frontend

React

React Router

Web Audio API

MediaRecorder API

Additional Dependencies

Communication Patterns

WebSocket Protocol

REST API Endpoints

Deployment Architecture

Infrastructure

Kubernetes on AWS

Configuration

CORS Configuration

Key Features

Connection Tracing

Rate Limiting

Real-time Metrics

Adaptive Buffering

Audio Processing Parameters

Next Steps

Backend Architecture

Frontend Architecture

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

​System Architecture

​Architecture Diagram

​Component Interaction Flow

​Technology Stack

​Backend

FastAPI

Uvicorn

OpenAI SDK

WebSockets

​Frontend

React

React Router

Web Audio API

MediaRecorder API

​Additional Dependencies

​Communication Patterns

​WebSocket Protocol

​REST API Endpoints

​Deployment Architecture

​Infrastructure

Kubernetes on AWS

​Configuration

​CORS Configuration

​Key Features

Connection Tracing

Rate Limiting

Real-time Metrics

Adaptive Buffering

​Audio Processing Parameters

​Next Steps

Backend Architecture

Frontend Architecture

Build docs developers (and LLMs) love

System Architecture

Architecture Diagram

Component Interaction Flow

Technology Stack

Backend

Frontend

Additional Dependencies

Communication Patterns

WebSocket Protocol

REST API Endpoints

Deployment Architecture

Infrastructure

Configuration

CORS Configuration

Key Features

Audio Processing Parameters

Next Steps