Overview
LangShazam’s audio processing pipeline handles everything from microphone capture to API submission. The system is designed for cross-platform compatibility, particularly ensuring iOS support through careful format selection.Audio Pipeline
Audio Format Specifications
LangShazam uses specific audio settings optimized for language detection:Format
MP4 - Widely supported, especially on iOS devices
Bitrate
16,000 bps - Balanced quality and file size
Chunk Interval
4 seconds - Optimal for real-time transmission
Minimum Size
20,000 bytes - Ensures enough data for processing
Frontend Audio Capture
MediaRecorder Setup
The frontend configures MediaRecorder with specific parameters:App.js (lines 158-162)
Audio Analysis for Visual Feedback
While recording, the system analyzes audio levels for UI feedback:App.js (lines 81-98)
The audio analysis runs separately from recording and doesn’t affect the audio data sent to the server.
Chunk Collection and Transmission
Audio chunks are collected and sent as they become available:App.js (lines 167-172)
App.js (line 189)
Audio Configuration Constants
The frontend defines these audio parameters:App.js (lines 24-26)
Backend Audio Processing
AudioProcessor Class
TheAudioProcessor class handles all server-side audio processing:
audio_processor.py (lines 10-17)
Rate-Limited API Calls
API calls are rate-limited using an asyncio semaphore:audio_processor.py (lines 19-42)
Audio Data Preparation
Audio bytes are wrapped in a BytesIO object with the correct filename:audio_processor.py (lines 25-26)
The
.name attribute is crucial - it tells the API what format to expect. Setting it to “audio.mp4” ensures proper processing.Processing Pipeline
The main processing method coordinates the entire flow:audio_processor.py (lines 49-71)
Memory Management
Garbage collection is explicitly called after processing to free memory:audio_processor.py (line 72)
Buffering Strategy
The WebSocket manager buffers incoming audio chunks:websocket_manager.py (lines 29-47)
Why Buffer Instead of Processing Immediately?
Why Buffer Instead of Processing Immediately?
Buffering ensures:
- Sufficient audio context for accurate detection
- Reduced API calls (cost savings)
- Better accuracy with longer audio samples
- Prevention of processing incomplete or corrupted chunks
Configuration Settings
All audio processing parameters are centralized in the backend configuration:settings.py (lines 25-32)
OpenAI API Configuration
The Whisper model configuration:settings.py (lines 34-38)
audio_processor.py (lines 32-36)
Performance Considerations
Concurrent Processing
Semaphore limits to 3 concurrent API calls to prevent rate limiting
Async I/O
All processing is async to handle multiple connections efficiently
Memory Cleanup
Explicit garbage collection after each request prevents memory leaks
Chunked Streaming
4-second chunks balance latency and audio quality
Error Handling
The audio processor includes comprehensive error handling:audio_processor.py (lines 43-47)
audio_processor.py (lines 67-70)
Metrics and Monitoring
The system tracks processing performance:- Processing Times: Recorded for each request
- Error Count: Incremented on failures
- API Call Duration: Logged separately from total processing time
- Connection Tracing: Every log includes connection ID
audio_processor.py (lines 56-58)
Best Practices
Implement Minimum Thresholds
Require minimum audio duration/size before processing to ensure accuracy
Troubleshooting
Audio not being captured
Audio not being captured
- Check microphone permissions in browser
- Verify HTTPS connection (required for getUserMedia)
- Check browser console for MediaDevices errors
Detection takes too long
Detection takes too long
- Check network connectivity
- Verify audio chunks are being sent (check WebSocket messages)
- Review backend logs for API delays
- Consider reducing max_concurrent_calls if rate limited
Format not supported errors
Format not supported errors
- Ensure MP4 format is being used
- Check MediaRecorder.isTypeSupported(‘audio/mp4’) in browser
- Review audio_file.name is set to “audio.mp4”
iOS-specific issues
iOS-specific issues
- MP4 format is required for iOS
- Ensure user gesture initiated the recording (iOS requirement)
- Test on actual device, not just simulator

