Overview
The WebSocket API enables real-time voice interviews with live transcription, using Socket.IO for bidirectional communication between client and server.
WebSocket connections use the Socket.IO protocol. Install socket.io-client in your frontend.
Connection Setup
Installation
npm install socket.io-client
Basic Connection
import io from 'socket.io-client' ;
const socket = io ( 'http://localhost:5000' , {
transports: [ 'websocket' ],
reconnection: true
});
socket . on ( 'connect' , () => {
console . log ( 'Connected to server:' , socket . id );
});
socket . on ( 'disconnect' , () => {
console . log ( 'Disconnected from server' );
});
Events Reference
Client → Server Events
start_interview
Initialize a live interview session with adaptive question generation.
socket . emit ( 'start_interview' , {
user_id: 123 ,
stress_test: false // Set true for curveball questions
});
Parameters:
Field Type Required Description user_idinteger Yes Authenticated user ID stress_testboolean No Enable curveball/stress mode (default: false)
Response Events:
agent_next_question - First interview question
backend_ready - Server ready to receive audio
interviewer_done
Signal that interviewer has finished speaking and user’s turn begins.
socket . emit ( 'interviewer_done' , {
user_id: 123
});
Response Events:
user_turn_started - User can now speak
Starts silence detection timer
audio_chunk
Stream audio data in real-time for transcription and analysis.
// Send audio chunk (1-second intervals recommended)
socket . emit ( 'audio_chunk' , {
user_id: 123 ,
audio_data: base64AudioData , // Base64-encoded PCM
sample_rate: 16000
});
Parameters:
Field Type Required Description user_idinteger Yes User ID audio_datastring Yes Base64-encoded audio (16-bit PCM) sample_rateinteger Yes Sample rate (16000 Hz recommended)
Audio Format:
Encoding : 16-bit PCM
Sample Rate : 16kHz
Channels : Mono (1 channel)
Chunk Duration : 1 second (16000 samples)
Response Events:
live_transcript - Real-time partial transcription
Voice activity updates metrics
stop_interview
End the interview session and receive final analysis.
socket . emit ( 'stop_interview' , {
user_id: 123
});
Response Events:
interview_complete - Final results and metrics
leave_interview
Cleanup and leave the interview room.
socket . emit ( 'leave_interview' , {
user_id: 123
});
Server → Client Events
connected
Connection confirmation.
socket . on ( 'connected' , ( data ) => {
console . log ( 'Status:' , data . status );
});
Payload:
backend_ready
Server is ready to receive audio chunks.
socket . on ( 'backend_ready' , ( data ) => {
console . log ( 'Backend ready at:' , data . timestamp );
// Start sending audio chunks
});
Payload:
{
"status" : "ready" ,
"timestamp" : 1709478123.456
}
agent_next_question
Next interview question from adaptive agent.
socket . on ( 'agent_next_question' , ( data ) => {
console . log ( 'Question:' , data . question );
});
Payload:
{
"question" : "Can you explain the difference between synchronous and asynchronous programming?"
}
live_transcript
Real-time partial transcription updates.
socket . on ( 'live_transcript' , ( data ) => {
updateTranscriptUI ( data . text );
});
Payload:
{
"text" : "So asynchronous programming allows..."
}
Transcription uses browser SpeechRecognition API for ultra-low latency (<200ms).
user_answer_complete
User’s answer has been processed and evaluated.
socket . on ( 'user_answer_complete' , ( data ) => {
console . log ( 'Answer:' , data . answer );
console . log ( 'Next question:' , data . next_question );
});
Payload:
{
"answer" : "Asynchronous programming allows multiple tasks..." ,
"question" : "Can you explain the difference between sync and async?" ,
"next_question" : "How would you handle async errors in Python?"
}
timer_update
Interview timer and turn time updates.
socket . on ( 'timer_update' , ( data ) => {
updateTimer ( data . time_remaining );
updateTurnTimer ( data . turn_time_elapsed );
});
Payload:
{
"time_remaining" : 1680 ,
"turn_time_elapsed" : 8.3
}
interview_complete
Final interview results with comprehensive metrics.
socket . on ( 'interview_complete' , ( data ) => {
displayResults ( data );
});
Payload:
{
"success" : true ,
"processing_method" : "research_grade_event_driven" ,
"transcript" : "Full interview transcript..." ,
"conversation" : "Q: Question 1 \n\n A: Answer 1 \n\n ..." ,
"semantic_similarity" : 0.78 ,
"analysis_valid" : true ,
"total_duration" : 1245.6 ,
"speaking_time" : 812.3 ,
"total_words" : 987 ,
"metrics" : {
"session_duration" : 1245.6 ,
"effective_duration" : 1200.0 ,
"speaking_time" : 812.3 ,
"silence_time" : 387.7 ,
"forced_silence_time" : 45.6 ,
"speaking_ratio" : 0.652 ,
"wpm" : 145.2 ,
"total_words" : 987 ,
"avg_pause_duration" : 1.8 ,
"pause_count" : 34 ,
"long_pause_count" : 5 ,
"hesitation_rate" : 0.12 ,
"articulation_rate" : 3.2 ,
"fluency_score" : 0.82 ,
"avg_response_latency" : 2.1 ,
"avg_semantic_similarity" : 0.78 ,
"avg_keyword_coverage" : 0.65 ,
"questions_answered" : 8
},
"qa_pairs" : [
{
"question" : "Explain SOLID principles" ,
"answer" : "SOLID is an acronym..." ,
"expected_answer" : "SOLID principles include..." ,
"similarity" : 0.85 ,
"keyword_coverage" : 0.72
}
]
}
interview_error
Error occurred during interview processing.
socket . on ( 'interview_error' , ( data ) => {
console . error ( 'Interview error:' , data . error );
});
Payload:
{
"error" : "Transcription service unavailable"
}
Real-Time Transcription Flow
Architecture
Live Transcription Pipeline
Browser Capture (Client)
MediaRecorder captures audio at 16kHz PCM
1-second chunks sent via WebSocket
Real-Time Processing (Server)
Voice Activity Detection (VAD)
Browser SpeechRecognition for live captions (<200ms)
Metrics tracking (speech time, pauses, WPM)
Final Analysis (Server)
Whisper transcription for accuracy
Semantic similarity scoring
Comprehensive speech metrics
Complete Example
import io from 'socket.io-client' ;
class LiveInterview {
constructor ( userId ) {
this . userId = userId ;
this . socket = io ( 'http://localhost:5000' );
this . setupListeners ();
}
setupListeners () {
this . socket . on ( 'connected' , () => {
console . log ( 'Connected to interview server' );
});
this . socket . on ( 'backend_ready' , () => {
console . log ( 'Server ready - start speaking' );
this . startAudioCapture ();
});
this . socket . on ( 'agent_next_question' , ( data ) => {
console . log ( 'Question:' , data . question );
this . displayQuestion ( data . question );
});
this . socket . on ( 'live_transcript' , ( data ) => {
this . updateTranscript ( data . text );
});
this . socket . on ( 'user_answer_complete' , ( data ) => {
console . log ( 'Answer evaluated:' , data . answer );
});
this . socket . on ( 'interview_complete' , ( data ) => {
console . log ( 'Final score:' , data . metrics . avg_semantic_similarity );
this . displayResults ( data );
});
}
startInterview ( stressMode = false ) {
this . socket . emit ( 'start_interview' , {
user_id: this . userId ,
stress_test: stressMode
});
}
notifyInterviewerDone () {
this . socket . emit ( 'interviewer_done' , {
user_id: this . userId
});
}
async startAudioCapture () {
const stream = await navigator . mediaDevices . getUserMedia ({ audio: true });
const mediaRecorder = new MediaRecorder ( stream );
mediaRecorder . ondataavailable = ( event ) => {
const reader = new FileReader ();
reader . onloadend = () => {
const base64Audio = reader . result . split ( ',' )[ 1 ];
this . socket . emit ( 'audio_chunk' , {
user_id: this . userId ,
audio_data: base64Audio ,
sample_rate: 16000
});
};
reader . readAsDataURL ( event . data );
};
mediaRecorder . start ( 1000 ); // 1-second chunks
this . mediaRecorder = mediaRecorder ;
}
stopInterview () {
if ( this . mediaRecorder ) {
this . mediaRecorder . stop ();
}
this . socket . emit ( 'stop_interview' , {
user_id: this . userId
});
}
displayQuestion ( question ) {
// Update UI with question
}
updateTranscript ( text ) {
// Update live transcript UI
}
displayResults ( data ) {
// Show final interview results
}
}
// Usage
const interview = new LiveInterview ( 123 );
interview . startInterview ();
Speech Metrics Explained
Metric Description Good Range WPM Words per minute 120-160 Speaking Ratio % of time speaking vs total 0.5-0.7 Fluency Score Overall fluency (0-1) >0.75 Articulation Rate Syllables per second 3-5 Long Pause Count Pauses >5 seconds <5 Response Latency Avg time to start answering <3s Semantic Similarity Answer quality vs expected >0.7 Keyword Coverage Key term usage >0.6
Error Handling
socket . on ( 'interview_error' , ( data ) => {
console . error ( 'Error:' , data . error );
// Handle error gracefully
});
socket . on ( 'disconnect' , () => {
console . log ( 'Disconnected - attempting reconnect' );
});
socket . on ( 'connect_error' , ( error ) => {
console . error ( 'Connection error:' , error );
});
Best Practices
Audio Quality
Use 16kHz sample rate
Send 1-second chunks
16-bit PCM encoding
Mono audio only
Connection Management
Enable reconnection
Handle disconnect events
Clean up on unmount
Use transports: [‘websocket’]
User Experience
Show live transcription
Display timer updates
Indicate turn status
Provide audio feedback
Error Recovery
Retry on connection loss
Fallback to batch mode
Validate audio before send
Log errors for debugging