The useVAD hook provides Voice Activity Detection (VAD) to identify when speech is present in audio streams. It returns precise start and end timestamps for each speech segment, making it essential for voice-activated features and efficient audio processing.
Basic Usage
import { useVAD } from 'react-native-executorch' ;
function VoiceDetector () {
const { forward , isReady , error } = useVAD ({
model: {
modelSource: require ( './models/vad-model.pte' ),
},
});
const detectSpeech = async ( audioBuffer : Float32Array ) => {
if ( ! isReady ) return ;
const segments = await forward ( audioBuffer );
console . log ( `Found ${ segments . length } speech segments` );
segments . forEach (( segment ) => {
console . log ( `Speech from ${ segment . start } s to ${ segment . end } s` );
console . log ( `Duration: ${ segment . end - segment . start } s` );
});
};
return (
< View >
{ error && < Text > Error : { error . message }</ Text >}
< Button onPress = { detectSpeech } title = "Detect Speech" disabled = {! isReady } />
</ View >
);
}
Hook Signature
useVAD(props)
function useVAD ( props : VADProps ) : VADType ;
Parameters
Model configuration object. Location of the VAD model .pte file. Can be a URL (string), local file (require), or resource ID (number).
Prevent automatic model loading on mount. Useful for lazy loading scenarios.
Returns
Contains error details if model loading or inference fails.
Indicates whether the model has loaded successfully and is ready for detection.
Indicates whether a detection is currently in progress.
Download progress as a value between 0 and 1.
forward
(waveform: Float32Array) => Promise<Segment[]>
Detect speech segments in the provided audio waveform. Returns an array of segments with start and end times.
Detection Method
Detect speech activity in audio:
const { forward , isReady } = useVAD ({ model });
// Audio must be 16kHz mono
const segments = await forward ( audioBuffer );
segments . forEach (( segment ) => {
console . log ( `Speech: ${ segment . start } s - ${ segment . end } s` );
});
Types
Segment
Represents a detected speech segment:
interface Segment {
start : number ; // Start time in seconds
end : number ; // End time in seconds
}
VADProps
Configuration for the VAD hook:
interface VADProps {
model : {
modelSource : ResourceSource ;
};
preventLoad ?: boolean ;
}
VADType
Return type of the useVAD hook:
interface VADType {
error : RnExecutorchError | null ;
isReady : boolean ;
isGenerating : boolean ;
downloadProgress : number ;
forward ( waveform : Float32Array ) : Promise < Segment []>;
}
Audio must be in the correct format or detection will fail.
Sample rate : 16kHz (16,000 samples per second)
Channels : Mono (single channel)
Data type : Float32Array
Value range : -1.0 to 1.0 (normalized)
Buffer layout : Contiguous samples in time order
Converting Audio
Example of preparing audio for VAD:
function prepareAudioForVAD ( audioBuffer : AudioBuffer ) : Float32Array {
// Resample to 16kHz
const targetSampleRate = 16000 ;
const resampled = resampleAudio ( audioBuffer , targetSampleRate );
// Convert to mono by averaging channels
let mono : Float32Array ;
if ( audioBuffer . numberOfChannels === 1 ) {
mono = resampled . getChannelData ( 0 );
} else {
const left = resampled . getChannelData ( 0 );
const right = resampled . getChannelData ( 1 );
mono = new Float32Array ( left . length );
for ( let i = 0 ; i < left . length ; i ++ ) {
mono [ i ] = ( left [ i ] + right [ i ]) / 2 ;
}
}
// Normalize to [-1.0, 1.0]
const normalized = new Float32Array ( mono . length );
for ( let i = 0 ; i < mono . length ; i ++ ) {
normalized [ i ] = Math . max ( - 1 , Math . min ( 1 , mono [ i ]));
}
return normalized ;
}
Advanced Usage
Processing Long Audio Files
For long recordings, process in chunks to manage memory:
const { forward } = useVAD ({ model });
const detectInLongAudio = async (
audioBuffer : Float32Array ,
chunkDuration : number = 30 // 30 seconds per chunk
) => {
const sampleRate = 16000 ;
const chunkSize = chunkDuration * sampleRate ;
const allSegments : Segment [] = [];
for ( let i = 0 ; i < audioBuffer . length ; i += chunkSize ) {
const chunk = audioBuffer . slice ( i , Math . min ( i + chunkSize , audioBuffer . length ));
const segments = await forward ( chunk );
// Adjust timestamps to account for chunk offset
const offset = i / sampleRate ;
const adjustedSegments = segments . map (( seg ) => ({
start: seg . start + offset ,
end: seg . end + offset ,
}));
allSegments . push ( ... adjustedSegments );
}
return allSegments ;
};
Real-Time Stream Processing
Detect speech in live audio streams:
function LiveVAD () {
const { forward , isReady } = useVAD ({ model });
const [ activeSpeech , setActiveSpeech ] = useState ( false );
const bufferRef = useRef < Float32Array []>([]);
const processAudioChunk = async ( chunk : Float32Array ) => {
if ( ! isReady ) return ;
// Accumulate chunks
bufferRef . current . push ( chunk );
// Process every second of audio
const totalSamples = bufferRef . current . reduce (( sum , buf ) => sum + buf . length , 0 );
if ( totalSamples >= 16000 ) {
// Concatenate buffers
const combined = new Float32Array ( totalSamples );
let offset = 0 ;
for ( const buf of bufferRef . current ) {
combined . set ( buf , offset );
offset += buf . length ;
}
// Detect speech
const segments = await forward ( combined );
setActiveSpeech ( segments . length > 0 );
// Clear buffer
bufferRef . current = [];
}
};
return (
< View >
< Text > Speech Active : { activeSpeech ? 'Yes' : 'No' }</ Text >
</ View >
);
}
Extract only the speech portions from audio:
const extractSpeechSegments = async (
audioBuffer : Float32Array ,
segments : Segment []
) : Promise < Float32Array []> => {
const sampleRate = 16000 ;
const speechChunks : Float32Array [] = [];
for ( const segment of segments ) {
const startSample = Math . floor ( segment . start * sampleRate );
const endSample = Math . floor ( segment . end * sampleRate );
const chunk = audioBuffer . slice ( startSample , endSample );
speechChunks . push ( chunk );
}
return speechChunks ;
};
// Usage
const { forward } = useVAD ({ model });
const segments = await forward ( audioBuffer );
const speechOnly = await extractSpeechSegments ( audioBuffer , segments );
// Process only speech segments
for ( const speechChunk of speechOnly ) {
await transcribe ( speechChunk );
}
Filtering Short Segments
Remove brief noise detections:
const filterShortSegments = (
segments : Segment [],
minDuration : number = 0.3 // 300ms minimum
) : Segment [] => {
return segments . filter (( seg ) => seg . end - seg . start >= minDuration );
};
// Usage
const { forward } = useVAD ({ model });
const allSegments = await forward ( audioBuffer );
const speechSegments = filterShortSegments ( allSegments , 0.5 ); // Only segments >= 500ms
Merging Adjacent Segments
Combine segments with small gaps:
const mergeAdjacentSegments = (
segments : Segment [],
maxGap : number = 0.5 // 500ms maximum gap
) : Segment [] => {
if ( segments . length === 0 ) return [];
const merged : Segment [] = [];
let current = { ... segments [ 0 ] };
for ( let i = 1 ; i < segments . length ; i ++ ) {
const gap = segments [ i ]. start - current . end ;
if ( gap <= maxGap ) {
// Merge with current segment
current . end = segments [ i ]. end ;
} else {
// Save current and start new segment
merged . push ( current );
current = { ... segments [ i ] };
}
}
merged . push ( current );
return merged ;
};
// Usage
const segments = await forward ( audioBuffer );
const cleanSegments = mergeAdjacentSegments (
filterShortSegments ( segments , 0.3 ),
0.5
);
Integration Examples
VAD + Speech to Text
Optimize transcription by processing only speech:
import { useVAD } from 'react-native-executorch' ;
import { useSpeechToText } from 'react-native-executorch' ;
function SmartTranscription () {
const vad = useVAD ({ model: vadModel });
const stt = useSpeechToText ({ model: sttModel });
const transcribeWithVAD = async ( audioBuffer : Float32Array ) => {
// Detect speech segments
const segments = await vad . forward ( audioBuffer );
console . log ( `Found ${ segments . length } speech segments` );
// Extract and transcribe only speech portions
const sampleRate = 16000 ;
const transcriptions : string [] = [];
for ( const segment of segments ) {
const startSample = Math . floor ( segment . start * sampleRate );
const endSample = Math . floor ( segment . end * sampleRate );
const speechChunk = audioBuffer . slice ( startSample , endSample );
const result = await stt . transcribe ( speechChunk );
transcriptions . push ( result . text );
}
return transcriptions . join ( ' ' );
};
return < TranscriptionUI onTranscribe ={ transcribeWithVAD } />;
}
Voice Command Detection
Trigger actions when speech is detected:
function VoiceCommandListener () {
const { forward , isReady } = useVAD ({ model });
const [ listening , setListening ] = useState ( false );
const startListening = async () => {
setListening ( true );
// Continuously monitor audio
const audioStream = await startAudioCapture ();
for await ( const chunk of audioStream ) {
const segments = await forward ( chunk );
if ( segments . length > 0 ) {
// Speech detected - process command
await handleVoiceCommand ( chunk );
}
}
};
return (
< Button
onPress = { startListening }
title = {listening ? 'Listening...' : 'Start Listening' }
disabled = {! isReady }
/>
);
}
Audio Visualization
Visualize speech activity:
function SpeechVisualizer ({ audioBuffer } : { audioBuffer : Float32Array }) {
const { forward } = useVAD ({ model });
const [ segments , setSegments ] = useState < Segment []>([]);
useEffect (() => {
const detectSegments = async () => {
const detected = await forward ( audioBuffer );
setSegments ( detected );
};
detectSegments ();
}, [ audioBuffer ]);
const duration = audioBuffer . length / 16000 ; // Total duration in seconds
return (
< View style = {{ flexDirection : 'row' , height : 50 }} >
{ segments . map (( seg , idx ) => {
const left = ( seg . start / duration ) * 100 ;
const width = (( seg . end - seg . start ) / duration ) * 100 ;
return (
< View
key = { idx }
style = {{
position : 'absolute' ,
left : ` ${ left } %` ,
width : ` ${ width } %` ,
height : '100%' ,
backgroundColor : 'green' ,
opacity : 0.5 ,
}}
/>
);
})}
</ View >
);
}
Error Handling
const { forward , error , isReady } = useVAD ({ model });
if ( error ) {
console . error ( 'VAD Error:' , error . message );
}
try {
const segments = await forward ( audioBuffer );
} catch ( err ) {
if ( err . code === 'MODULE_NOT_LOADED' ) {
console . error ( 'Model not ready yet' );
} else if ( err . code === 'MODEL_GENERATING' ) {
console . error ( 'Already processing audio' );
} else {
console . error ( 'Detection failed:' , err . message );
}
}
Best Practices
Audio Quality : Clean audio with minimal background noise produces better results.
Segment Filtering : Always filter out very short segments (< 300ms) which are often noise.
Segment Merging : Merge segments with small gaps to avoid fragmenting continuous speech.
Buffer Size : Process at least 1-2 seconds of audio for reliable detection.
Memory Management : For long recordings, process in chunks and clear buffers regularly.
Real-Time Processing : Accumulate small chunks (100-200ms) before running VAD to reduce overhead.
Combined Workflows : Use VAD before STT to reduce computational cost and improve accuracy.
Batch Processing : Process multiple seconds at once rather than very small chunks.
Async Processing : Run VAD asynchronously to avoid blocking the UI thread.
Cache Model : The model is cached after first load, making subsequent uses faster.
Threshold Tuning : Experiment with minimum segment duration for your use case.
Common Use Cases
Meeting Recorder
function MeetingRecorder () {
const { forward } = useVAD ({ model });
const [ speakers , setSpeakers ] = useState < Segment []>([]);
const analyzeMeeting = async ( recording : Float32Array ) => {
const segments = await forward ( recording );
const filtered = filterShortSegments ( segments , 1.0 ); // 1s minimum
const merged = mergeAdjacentSegments ( filtered , 2.0 ); // 2s max gap
setSpeakers ( merged );
return merged ;
};
return (
< View >
< Text > Speech Segments : { speakers . length }</ Text >
{ speakers . map (( seg , idx ) => (
< Text key = { idx } >
Speaker { idx + 1} : { seg . start . toFixed ( 1 )} s - { seg . end . toFixed ( 1 )} s
</ Text >
))}
</ View >
);
}
Silence Detection
const detectSilence = async (
audioBuffer : Float32Array ,
vad : VADType
) : Promise < Segment []> => {
const segments = await vad . forward ( audioBuffer );
const duration = audioBuffer . length / 16000 ;
const silenceSegments : Segment [] = [];
// Before first speech
if ( segments . length > 0 && segments [ 0 ]. start > 0 ) {
silenceSegments . push ({ start: 0 , end: segments [ 0 ]. start });
}
// Between speech segments
for ( let i = 0 ; i < segments . length - 1 ; i ++ ) {
silenceSegments . push ({
start: segments [ i ]. end ,
end: segments [ i + 1 ]. start ,
});
}
// After last speech
if ( segments . length > 0 && segments [ segments . length - 1 ]. end < duration ) {
silenceSegments . push ({
start: segments [ segments . length - 1 ]. end ,
end: duration ,
});
}
return silenceSegments ;
};