WebSocket Endpoint
Real-time audio streaming endpoint for Telnyx telephony integration
Endpoint
Description
WebSocket endpoint that receives real-time audio streams from Telnyx during active calls. The endpoint performs live transcription using Deepgram, analyzes audio for distress signals, and maintains real-time call state.Connection
Telnyx initiates the WebSocket connection when the system calls thestreaming_start action on a call.
Connection Example (JavaScript)
Message Protocol
All messages are JSON-formatted text frames.Client → Server Messages
Event: connected
Initial connection acknowledgment from Telnyx.Event: start
Streaming session started. Contains call metadata and audio configuration.Always
"start"Example
- Resolves
call_control_idtocall_session_idusingCALL_CONTROL_TO_CALL_IDmap - Initializes call-specific state in
LIVE_SIGNALSdictionary - Starts Deepgram real-time transcription session
- Prepares audio processing buffers
Event: media
Audio data packet (sent repeatedly during call).Always
"media"Audio payload
Base64-encoded audio data (PCMU/μ-law format, typically 80 or 160 bytes)
Example
-
Audio Decoding
- Base64 decode payload
- Convert μ-law to PCM16 little-endian format
- Typical frame: 80-160 bytes = 10-20ms of audio
-
Transcription
- Stream PCM16 data to Deepgram WebSocket
- Update
LIVE_SIGNALS[call_id]["transcript_live"]with partial results - Store finalized transcript segments
-
Audio Analysis
- Buffer audio into 160ms chunks (2560 bytes @ 8kHz)
- Calculate RMS (root mean square) for voice activity detection
- Apply exponential moving average (EMA) for baseline
- Compute distress score from deviation above baseline
- Update
LIVE_SIGNALS[call_id]with metrics:chunks: Total audio chunks processedvoiced_chunks: Chunks with voice activityvoiced_seconds: Total voice durationdistress: Current distress score (0.0-1.0)max_distress: Peak distress scoreema: Rolling average baseline
-
WAV Recording
- Append PCM16 data to buffer for offline processing
- Saved to
data/calls/{timestamp}.wavon disconnect
Event: stop
Streaming session ended.- Finalize Deepgram transcription
- Write WAV file to disk
- Update final transcript in
LIVE_SIGNALS - Close WebSocket connection
Audio Processing Details
Voice Activity Detection (VAD)
Distress Score Computation
- Tracks sudden increases in volume/intensity
- Uses EMA baseline to adapt to call dynamics
- Decays slowly (0.9 multiplier) when intensity drops
- Ranges from 0.0 (calm) to 1.0 (high distress)
Audio Format Conversion
Telnyx sends PCMU (μ-law) encoded audio. The system converts to PCM16:Live State Management
TheLIVE_SIGNALS dictionary maintains per-call state:
Real-time Transcription
The system uses Deepgram’s streaming API:- Connection: Opens WebSocket to Deepgram when
startevent received - Streaming: Forwards PCM16 audio chunks to Deepgram
- Callbacks:
_on_partial(text, call_id): Updatestranscript_livewith interim results_on_final(text, call_id): Updatestranscriptwith finalized segments
- Finalization: On
stopevent, closes Deepgram connection and gets final transcript
Implementation
Location:app/api/ws/handler.py:344-549
Configuration
Set in environment variables:Public WebSocket URL that Telnyx can reach (e.g.,
wss://your-domain.com/ws)Deepgram API key for speech-to-text
Error Handling
- Call ID Not Found: If
call_control_iddoesn’t map to acall_session_id, processing is skipped - WebSocket Disconnect: Gracefully closes, saves audio, finalizes transcript
- Deepgram Errors: Logged but don’t crash connection; falls back to audio-only analysis
- Missing Audio Data: Empty payloads are skipped silently
Performance Characteristics
- Latency: 100-300ms for partial transcripts
- Throughput: Handles multiple concurrent calls (one WebSocket per call)
- Buffer Size: 160ms chunks (2560 bytes @ 8kHz PCM16)
- Memory: ~2MB per minute of audio buffered
Example Debug Output
Integration with Call Lifecycle
-
Incoming Call (
call.initiatedwebhook)- System answers and calls
streaming_startaction - Telnyx opens WebSocket to
/ws
- System answers and calls
-
Active Call (
call.answeredwebhook)- Audio streams via WebSocket
- Real-time processing updates
LIVE_SIGNALS - UI polls
/api/v1/live_queuefor updates
-
Call End (
call.hangupwebhook)- WebSocket receives
stopevent - Final audio saved, transcript finalized
- Webhook handler performs full analysis
- Queue item created
- WebSocket receives
Security Considerations
- No Authentication: WebSocket accepts any connection (internal use only)
- Production: Should add:
- Token-based authentication
- Rate limiting per IP
- Connection timeout enforcement
- Input validation on all events
- Data Privacy: Audio files stored locally in
data/calls/ - Phone Numbers: Automatically masked (last 4 digits only)
Debugging
To view live calls in browser:/api/v1/live_queue and displays:
- Real-time transcript
- Distress scores
- Risk levels
- Call metadata