Skip to main content
Coming in v0.5.0 - This feature is planned for a future release. The API interface is subject to change.

Overview

Speech Enhancement improves audio quality by reducing background noise, echo, and other distortions while preserving speech clarity. This is useful for:
  • Preprocessing noisy recordings before transcription
  • Improving call quality in VoIP applications
  • Cleaning up field recordings
  • Enhancing podcast and video audio
  • Removing echo and reverberation

Installation

Enhancement will be included in the main package:
npm install react-native-sherpa-onnx

Basic Usage

import { initializeEnhancement, enhanceAudio } from 'react-native-sherpa-onnx/enhancement';

// Initialize enhancement with model
await initializeEnhancement({
  modelPath: {
    type: 'auto',
    path: 'models/enhancement-model'
  }
});

// Enhance noisy audio
const result = await enhanceAudio('path/to/noisy-audio.wav');

console.log('Enhanced audio saved to:', result.outputPath);

API Reference

initializeEnhancement()

Initialize the Speech Enhancement model.
await initializeEnhancement(options: EnhancementInitializeOptions): Promise<void>

Parameters

options
EnhancementInitializeOptions
required
Configuration options for enhancement initialization

Returns

Promise that resolves when enhancement is initialized.

Example

await initializeEnhancement({
  modelPath: {
    type: 'auto',
    path: 'models/speech-enhancement'
  }
});

enhanceAudio()

Enhance speech quality in an audio file.
await enhanceAudio(filePath: string): Promise<EnhancementResult>

Parameters

filePath
string
required
Path to the audio file to enhance

Returns

Promise that resolves to enhancement result.
EnhancementResult
object

Example

const result = await enhanceAudio('/path/to/noisy-recording.wav');

console.log('Enhanced audio:', result.outputPath);

// Use enhanced audio for transcription
await transcribeRecognizer({ filePath: result.outputPath });

unloadEnhancement()

Release enhancement model resources.
await unloadEnhancement(): Promise<void>

Returns

Promise that resolves when resources are released.

Example

// When done with enhancement
await unloadEnhancement();

Types

EnhancementInitializeOptions

interface EnhancementInitializeOptions {
  modelPath: ModelPathConfig;
  // Additional options will be added in v0.5.0
}

EnhancementResult

interface EnhancementResult {
  outputPath: string;  // Path to enhanced audio file
  // Additional fields will be added in v0.5.0
}

ModelPathConfig

interface ModelPathConfig {
  type: 'auto' | 'file';
  path: string;
}

Best Practices

Different enhancement models target different noise types:
  • Speech denoising: Removes steady-state background noise (AC, traffic)
  • Dereverb: Reduces echo and room reflections
  • Bandwidth extension: Enhances narrowband audio to wideband
  • Multi-modal: Handles various noise types simultaneously
Select based on your primary audio quality issues.
Speech enhancement involves quality trade-offs:
  • Over-processing: Can introduce artifacts or “robotic” sound
  • Under-processing: May not sufficiently improve quality
  • Processing time: More aggressive enhancement takes longer
Balance enhancement strength with naturalness for your use case.
Enhancement works best as part of a pipeline:
// 1. Enhance audio quality
const enhanced = await enhanceAudio('noisy.wav');

// 2. Detect speech segments
const segments = await detectVoiceActivity(enhanced.outputPath);

// 3. Transcribe clean audio
const text = await transcribeRecognizer({ 
  filePath: enhanced.outputPath 
});

Common Use Cases

Preprocessing for Transcription

import { initializeEnhancement, enhanceAudio } from 'react-native-sherpa-onnx/enhancement';
import { transcribeRecognizer } from 'react-native-sherpa-onnx';

async function transcribeNoisyAudio(audioPath: string) {
  // Initialize enhancement
  await initializeEnhancement({
    modelPath: { type: 'auto', path: 'models/enhancement' }
  });
  
  // Enhance audio first
  console.log('Enhancing audio...');
  const enhanced = await enhanceAudio(audioPath);
  
  // Transcribe enhanced audio
  console.log('Transcribing...');
  const result = await transcribeRecognizer({ 
    filePath: enhanced.outputPath 
  });
  
  return result.text;
}

Batch Processing

import { enhanceAudio } from 'react-native-sherpa-onnx/enhancement';
import { readdir } from 'fs/promises';
import { join } from 'path';

async function enhanceDirectory(inputDir: string, outputDir: string) {
  const files = await readdir(inputDir);
  const audioFiles = files.filter(f => f.endsWith('.wav'));
  
  console.log(`Enhancing ${audioFiles.length} files...`);
  
  for (const file of audioFiles) {
    const inputPath = join(inputDir, file);
    const result = await enhanceAudio(inputPath);
    console.log(`Enhanced: ${file}`);
  }
  
  console.log('Batch enhancement complete');
}

Real-time Audio Cleaning

import { enhanceAudio } from 'react-native-sherpa-onnx/enhancement';
import { AudioRecorder } from 'react-native-audio-recorder';

async function recordAndEnhance() {
  // Record audio
  const recording = await AudioRecorder.record({
    duration: 10000, // 10 seconds
    sampleRate: 16000
  });
  
  // Enhance immediately after recording
  const enhanced = await enhanceAudio(recording.path);
  
  // Use enhanced audio
  console.log('Clean audio ready:', enhanced.outputPath);
  
  return enhanced.outputPath;
}

Error Handling

try {
  await initializeEnhancement({
    modelPath: {
      type: 'auto',
      path: 'models/enhancement'
    }
  });
  
  const result = await enhanceAudio('noisy-audio.wav');
  
  console.log('Enhancement successful:', result.outputPath);
  
} catch (error) {
  if (error.message.includes('model not found')) {
    console.error('Enhancement model not found. Please download the model.');
  } else if (error.message.includes('unsupported format')) {
    console.error('Audio format not supported. Convert to WAV format.');
  } else {
    console.error('Enhancement error:', error);
  }
} finally {
  await unloadEnhancement();
}

Performance Considerations

Speech enhancement is computationally intensive:
  • Processing time varies by model complexity (0.1x - 1.0x real-time)
  • Memory usage increases with audio length
  • Consider processing in chunks for long files
  • GPU acceleration may improve performance
  • Cache enhanced audio to avoid reprocessing

Limitations

Enhancement has limitations:
  • Cannot recover severely degraded audio
  • May introduce artifacts if audio quality is extremely poor
  • Works best with consistent noise patterns
  • Cannot separate overlapping speakers
  • Requires adequate sample rate (typically 16kHz minimum)

Technical Details

Supported Audio Formats

Enhancement will support common audio formats:
  • WAV (PCM, 16-bit, 16kHz or 48kHz recommended)
  • Additional formats may be added in v0.5.0

Processing Pipeline

  1. Load audio: Read input audio file
  2. Preprocess: Normalize and resample if needed
  3. Enhancement: Apply deep learning noise reduction
  4. Postprocess: Normalize output levels
  5. Save: Write enhanced audio to output file

Voice Activity Detection

Detect speech after enhancement

Speech Recognition

Transcribe enhanced audio

Speaker Diarization

Separate speakers after enhancement

Source Separation

Advanced audio separation

Build docs developers (and LLMs) love