WebSocket API

WebSocket /ws/audio/

Bidirectional WebSocket for streaming audio from glasses microphone to speech recognition and command processing.

Connection

Establish WebSocket connection:

const ws = new WebSocket('wss://api.jarvis.local/ws/audio/room123');

Path Parameters

room_code

string

required

Unique room identifier for this audio session (client-generated).

Authentication

No authentication required (for hackathon demo).

Client → Server Messages

Audio Chunk (Binary)

Send raw audio bytes (PCM or WebM format):

// Send audio buffer
ws.send(audioBuffer); // ArrayBuffer or Blob

Format requirements:

Sample rate: 16kHz recommended
Channels: Mono (1 channel)
Encoding: PCM16 or WebM/Opus
Chunk size: 1-5 seconds of audio

Server → Client Messages

Transcript Event

Sent when speech is recognized:

{
  "type": "transcript",
  "text": "identify John Smith"
}

type

string

Always "transcript" for transcription events.

text

string

Transcribed text from the audio chunk.

Command Event

Sent when a command is matched:

{
  "type": "command",
  "command": "IDENTIFY",
  "argument": "John Smith"
}

type

string

Always "command" for command events.

command

string

Matched command type:

IDENTIFY - Identify a person by name
RESEARCH - Research a topic or person
CAPTURE - Capture current frame
NONE - No command matched

argument

string

Extracted argument (e.g., person name, research query).

Supported Commands

The audio processor matches these voice commands:

Command Pattern	Command	Example
”identify [name]“	`IDENTIFY`	”identify John Smith"
"who is [name]“	`IDENTIFY`	”who is Jane Doe"
"research [query]“	`RESEARCH`	”research Tesla stock"
"look up [query]“	`RESEARCH`	”look up machine learning"
"capture”	`CAPTURE`	”capture this"
"take a picture”	`CAPTURE`	”take a picture”

Connection Lifecycle

Example Implementation (JavaScript)

class AudioStreamer {
  constructor(roomCode) {
    this.ws = new WebSocket(`wss://api.jarvis.local/ws/audio/${roomCode}`);
    this.setupHandlers();
  }
  
  setupHandlers() {
    this.ws.onopen = () => {
      console.log('Audio WebSocket connected');
      this.startMicrophone();
    };
    
    this.ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
      
      if (data.type === 'transcript') {
        console.log('Transcript:', data.text);
        this.onTranscript(data.text);
      } else if (data.type === 'command') {
        console.log('Command:', data.command, data.argument);
        this.onCommand(data.command, data.argument);
      }
    };
    
    this.ws.onerror = (error) => {
      console.error('WebSocket error:', error);
    };
    
    this.ws.onclose = () => {
      console.log('WebSocket closed');
      this.stopMicrophone();
    };
  }
  
  async startMicrophone() {
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    const mediaRecorder = new MediaRecorder(stream, {
      mimeType: 'audio/webm',
      audioBitsPerSecond: 16000
    });
    
    mediaRecorder.ondataavailable = (event) => {
      if (event.data.size > 0 && this.ws.readyState === WebSocket.OPEN) {
        this.ws.send(event.data);
      }
    };
    
    // Send chunks every 2 seconds
    mediaRecorder.start(2000);
    this.mediaRecorder = mediaRecorder;
  }
  
  stopMicrophone() {
    if (this.mediaRecorder) {
      this.mediaRecorder.stop();
    }
  }
  
  onTranscript(text) {
    // Update UI with transcript
    document.getElementById('transcript').textContent = text;
  }
  
  onCommand(command, argument) {
    // Handle commands
    if (command === 'IDENTIFY') {
      this.identifyPerson(argument);
    } else if (command === 'RESEARCH') {
      this.research(argument);
    } else if (command === 'CAPTURE') {
      this.captureFrame();
    }
  }
  
  identifyPerson(name) {
    console.log('Identifying:', name);
    // Trigger identification flow
  }
  
  research(query) {
    console.log('Researching:', query);
    // Trigger research flow
  }
  
  captureFrame() {
    console.log('Capturing frame');
    // Capture current video frame
  }
  
  close() {
    this.ws.close();
  }
}

// Usage
const streamer = new AudioStreamer('room_' + Date.now());

Close Codes

Code	Reason	Description
1000	Normal closure	Client closed connection cleanly
1008	Policy violation	OpenAI API key not configured
1011	Server error	Unexpected server error

Error Handling

ws.onclose = (event) => {
  if (event.code === 1008) {
    console.error('Audio API not configured');
    alert('Voice commands are not available');
  } else if (event.code !== 1000) {
    console.error('Connection closed unexpectedly:', event.code, event.reason);
    // Attempt reconnection with exponential backoff
    setTimeout(() => reconnect(), Math.min(1000 * Math.pow(2, retryCount), 30000));
  }
};

Performance

Transcription latency: 500-2000ms per chunk
Command matching: Less than 10ms
Max connection duration: No limit
Recommended chunk size: 2-3 seconds

Best Practices

Use WebM/Opus encoding for better compression

Send audio chunks every 2-3 seconds for responsive transcription

Handle reconnection with exponential backoff

Mute audio input when not needed to save API costs

Transcription uses OpenAI Whisper API (costs ~$0.006 per minute)

Long-running connections can accumulate significant costs

Debugging

Enable verbose logging:

ws.onmessage = (event) => {
  console.log('[WS] Received:', event.data);
  const data = JSON.parse(event.data);
  // ... handle message
};

ws.send = (data) => {
  console.log('[WS] Sending:', data.byteLength, 'bytes');
  WebSocket.prototype.send.call(ws, data);
};

Monitor connection state:

setInterval(() => {
  console.log('WebSocket state:', [
    'CONNECTING',
    'OPEN',
    'CLOSING',
    'CLOSED'
  ][ws.readyState]);
}, 5000);

Endpoints

Convex Functions

WebSocket /ws/audio/

Connection

Path Parameters

Authentication

Client → Server Messages

Audio Chunk (Binary)

Server → Client Messages

Transcript Event

Command Event

Supported Commands

Connection Lifecycle

Example Implementation (JavaScript)

Close Codes

Error Handling

Performance

Best Practices

Debugging

Build docs developers (and LLMs) love

Endpoints

Convex Functions

​WebSocket /ws/audio/

​Connection

​Path Parameters

​Authentication

​Client → Server Messages

​Audio Chunk (Binary)

​Server → Client Messages

​Transcript Event

​Command Event

​Supported Commands

​Connection Lifecycle

​Example Implementation (JavaScript)

​Close Codes

​Error Handling

​Performance

​Best Practices

​Debugging

Build docs developers (and LLMs) love

WebSocket /ws/audio/

Connection

Path Parameters

Authentication

Client → Server Messages

Audio Chunk (Binary)

Server → Client Messages

Transcript Event

Command Event

Supported Commands

Connection Lifecycle

Example Implementation (JavaScript)

Close Codes

Error Handling

Performance

Best Practices

Debugging