TrackVADEmitter

Overview

TrackVADEmitter connects a JitsiLocalTrack to a VAD (Voice Activity Detection) processor using the Web Audio API’s ScriptProcessorNode. It processes raw PCM audio data and emits VAD scores via events, enabling real-time voice activity detection for features like noise suppression, active speaker detection, and audio visualization.

Constructor

new TrackVADEmitter(
  procNodeSampleRate: number,
  vadProcessor: IVadProcessor,
  jitsiLocalTrack: JitsiLocalTrack
)

procNodeSampleRate

number

required

Sample rate of the ScriptProcessorNode. Valid values: 256, 512, 1024, 2048, 4096, 8192, 16384. Other values will default to the closest neighbor.

vadProcessor

IVadProcessor

required

VAD processor implementing the IVadProcessor interface for calculating voice activity scores

jitsiLocalTrack

JitsiLocalTrack

required

The JitsiLocalTrack (audio) to analyze

Use the TrackVADEmitter.create() factory method instead of calling the constructor directly.

Factory Method

create

Factory method that sets up all necessary components and creates a TrackVADEmitter instance.

static create(
  micDeviceId: string,
  procNodeSampleRate: number,
  vadProcessor: IVadProcessor
): Promise<TrackVADEmitter>

micDeviceId

string

required

Target microphone device ID to capture audio from

procNodeSampleRate

number

required

Sample rate for the ScriptProcessorNode (256, 512, 1024, 2048, 4096, 8192, or 16384)

vadProcessor

IVadProcessor

required

VAD processor that implements:

getSampleLength() - Returns required PCM sample size
getRequiredPCMFrequency() - Returns required PCM frequency
calculateAudioFrameVAD(pcmSample) - Calculates VAD score for a PCM sample

Returns: Promise resolving to a new TrackVADEmitter instance Example:

import TrackVADEmitter from '@jitsi/lib-jitsi-meet/modules/detection/TrackVADEmitter';

// Create a custom VAD processor
const vadProcessor = {
  getSampleLength() {
    return 480; // RNNoise typical sample size
  },
  
  getRequiredPCMFrequency() {
    return 48000; // 48kHz
  },
  
  calculateAudioFrameVAD(pcmSample) {
    // Process PCM sample and return VAD score (0-1)
    // This would typically use a library like RNNoise
    return myVADLibrary.process(pcmSample);
  }
};

// Create emitter
const vadEmitter = await TrackVADEmitter.create(
  'default', // Microphone device ID
  2048,      // ScriptProcessorNode sample rate
  vadProcessor
);

// Listen for VAD scores
vadEmitter.on('vad-score-published', (data) => {
  console.log('VAD score:', data.score);
  console.log('Device:', data.deviceId);
  console.log('Timestamp:', data.timestamp);
});

// Start processing
vadEmitter.start();

Methods

start

Starts the VAD emitter by connecting the audio graph. Audio data begins flowing through the ScriptProcessorNode.

start(): void

Example:

const vadEmitter = await TrackVADEmitter.create('default', 2048, vadProcessor);
vadEmitter.start();

console.log('VAD detection started');

stop

Stops the VAD emitter by disconnecting the audio graph and clearing internal buffers.

stop(): void

Example:

// Temporarily stop VAD processing
vadEmitter.stop();

// Can restart later
vadEmitter.start();

destroy

Performs complete cleanup: disconnects audio graph, stops the underlying track, and releases all resources.

destroy(): void

Always call destroy() when done to prevent memory leaks. After calling destroy(), the emitter cannot be reused.

Example:

// Clean up when done
vadEmitter.destroy();

Events

VAD_SCORE_PUBLISHED

Emitted whenever a VAD score is calculated for an audio frame. This event is fired at a rate determined by the processor sample size and node sample rate. Event Data:

deviceId

string

The microphone device ID being analyzed

score

number

Voice activity detection score (typically 0-1, where higher = more likely voice)

pcmData

Float32Array

The raw PCM audio sample that was analyzed

timestamp

number

Timestamp when the sample was processed (Date.now())

Example:

import { DetectionEvents } from '@jitsi/lib-jitsi-meet/modules/detection/DetectionEvents';

vadEmitter.on(DetectionEvents.VAD_SCORE_PUBLISHED, (data) => {
  const { deviceId, score, pcmData, timestamp } = data;
  
  if (score > 0.7) {
    console.log(`Voice detected on ${deviceId} at ${timestamp}`);
  }
  
  // Could process PCM data further
  // e.g., apply noise suppression, audio visualization
});

Complete Examples

Basic VAD Detection with RNNoise

import TrackVADEmitter from '@jitsi/lib-jitsi-meet/modules/detection/TrackVADEmitter';
import { DetectionEvents } from '@jitsi/lib-jitsi-meet/modules/detection/DetectionEvents';
import RNNoise from 'rnnoise-wasm'; // Hypothetical RNNoise wrapper

class VoiceActivityDetector {
  constructor(deviceId) {
    this.deviceId = deviceId;
    this.vadEmitter = null;
    this.isVoiceActive = false;
  }

  async initialize() {
    // Initialize RNNoise
    const rnnoise = await RNNoise.create();

    // Create VAD processor
    const vadProcessor = {
      getSampleLength: () => 480, // RNNoise sample size
      getRequiredPCMFrequency: () => 48000,
      calculateAudioFrameVAD: (sample) => rnnoise.processFrame(sample)
    };

    // Create VAD emitter
    this.vadEmitter = await TrackVADEmitter.create(
      this.deviceId,
      4096, // ScriptProcessorNode buffer size
      vadProcessor
    );

    // Listen for VAD events
    this.vadEmitter.on(DetectionEvents.VAD_SCORE_PUBLISHED, (data) => {
      this.handleVADScore(data);
    });

    // Start detection
    this.vadEmitter.start();
  }

  handleVADScore({ score, timestamp }) {
    const wasActive = this.isVoiceActive;
    this.isVoiceActive = score > 0.5;

    // Detect state changes
    if (!wasActive && this.isVoiceActive) {
      console.log('Voice started at', timestamp);
      this.onVoiceStart();
    } else if (wasActive && !this.isVoiceActive) {
      console.log('Voice stopped at', timestamp);
      this.onVoiceStop();
    }
  }

  onVoiceStart() {
    // Handle voice start (e.g., show indicator)
    document.getElementById('mic-indicator').classList.add('active');
  }

  onVoiceStop() {
    // Handle voice stop
    document.getElementById('mic-indicator').classList.remove('active');
  }

  destroy() {
    if (this.vadEmitter) {
      this.vadEmitter.destroy();
      this.vadEmitter = null;
    }
  }
}

// Usage
const detector = new VoiceActivityDetector('default');
await detector.initialize();

// Later...
detector.destroy();

Advanced: VAD with Audio Visualization

import TrackVADEmitter from '@jitsi/lib-jitsi-meet/modules/detection/TrackVADEmitter';
import { DetectionEvents } from '@jitsi/lib-jitsi-meet/modules/detection/DetectionEvents';

class VADVisualizer {
  constructor(canvasId, deviceId) {
    this.canvas = document.getElementById(canvasId);
    this.ctx = this.canvas.getContext('2d');
    this.deviceId = deviceId;
    this.vadEmitter = null;
    this.scoreHistory = [];
    this.maxHistory = 100;
  }

  async start(vadProcessor) {
    this.vadEmitter = await TrackVADEmitter.create(
      this.deviceId,
      2048,
      vadProcessor
    );

    this.vadEmitter.on(DetectionEvents.VAD_SCORE_PUBLISHED, (data) => {
      this.updateVisualization(data);
    });

    this.vadEmitter.start();
    this.draw();
  }

  updateVisualization({ score, pcmData }) {
    // Store score history
    this.scoreHistory.push(score);
    if (this.scoreHistory.length > this.maxHistory) {
      this.scoreHistory.shift();
    }

    // Calculate audio level from PCM
    const rms = Math.sqrt(
      pcmData.reduce((sum, val) => sum + val * val, 0) / pcmData.length
    );
    
    this.currentLevel = rms;
    this.currentScore = score;
  }

  draw() {
    const { width, height } = this.canvas;
    this.ctx.clearRect(0, 0, width, height);

    // Draw VAD score history
    this.ctx.strokeStyle = '#00ff00';
    this.ctx.beginPath();
    this.scoreHistory.forEach((score, i) => {
      const x = (i / this.maxHistory) * width;
      const y = height - (score * height);
      if (i === 0) {
        this.ctx.moveTo(x, y);
      } else {
        this.ctx.lineTo(x, y);
      }
    });
    this.ctx.stroke();

    // Draw current level meter
    const meterWidth = 20;
    const meterHeight = height * this.currentLevel * 10;
    this.ctx.fillStyle = this.currentScore > 0.5 ? '#00ff00' : '#ff0000';
    this.ctx.fillRect(width - meterWidth, height - meterHeight, meterWidth, meterHeight);

    requestAnimationFrame(() => this.draw());
  }

  stop() {
    if (this.vadEmitter) {
      this.vadEmitter.destroy();
      this.vadEmitter = null;
    }
  }
}

// Usage
const visualizer = new VADVisualizer('vad-canvas', 'default');
await visualizer.start(myVadProcessor);

VAD Processor Interface

Implement the IVadProcessor interface for custom VAD algorithms:

interface IVadProcessor {
  // Return the PCM sample size required by the processor
  getSampleLength(): number;
  
  // Return the required PCM frequency (e.g., 48000 for 48kHz)
  getRequiredPCMFrequency(): number;
  
  // Calculate VAD score for a PCM sample
  // Returns a number (typically 0-1) indicating voice probability
  calculateAudioFrameVAD(pcmSample: Float32Array | number[]): number;
}

Performance Considerations

ScriptProcessorNode is deprecated in favor of AudioWorklet. However, at the time of implementation, AudioWorklet had limited browser support. Consider migrating to AudioWorklet when browser support improves.

The procNodeSampleRate affects how often calculateAudioFrameVAD is called. Lower values (256, 512) provide more frequent updates but higher CPU usage. Higher values (8192, 16384) reduce CPU usage but lower update frequency.

Buffer Handling

The emitter handles sample size mismatches automatically:

If procNodeSampleRate doesn’t divide evenly by vadProcessor.getSampleLength(), residual PCM data is buffered
The residue is prepended to the next batch to ensure no audio data is lost
This allows flexible combinations of processor sample sizes and node buffer sizes

Browser Support

Chrome/Edge 14+
Firefox 25+
Safari 6+
Opera 15+

Requires Web Audio API support. The AudioContext is automatically created with the frequency required by your VAD processor.

Common Use Cases

Active speaker detection - Identify who is speaking in a conference
Noise suppression - Apply processing only when voice is detected
Audio gating - Mute audio below a voice activity threshold
Transcription optimization - Send audio to speech-to-text only when voice is present
Audio visualization - Display real-time voice activity indicators

Core API

Media Tracks

Events

Errors

Advanced APIs

Overview

Constructor

Factory Method

create

Methods

start

stop

destroy

Events

VAD_SCORE_PUBLISHED

Complete Examples

Basic VAD Detection with RNNoise

Advanced: VAD with Audio Visualization

VAD Processor Interface

Performance Considerations

Buffer Handling

Browser Support

Common Use Cases

Build docs developers (and LLMs) love

Core API

Media Tracks

Events

Errors

Advanced APIs

​Overview

​Constructor

​Factory Method

​create

​Methods

​start

​stop

​destroy

​Events

​VAD_SCORE_PUBLISHED

​Complete Examples

​Basic VAD Detection with RNNoise

​Advanced: VAD with Audio Visualization

​VAD Processor Interface

​Performance Considerations

​Buffer Handling

​Browser Support

​Common Use Cases

Build docs developers (and LLMs) love

Overview

Constructor

Factory Method

create

Methods

start

stop

destroy

Events

VAD_SCORE_PUBLISHED

Complete Examples

Basic VAD Detection with RNNoise

Advanced: VAD with Audio Visualization

VAD Processor Interface

Performance Considerations

Buffer Handling

Browser Support

Common Use Cases