Skip to main content

Overview

Speech synthesizer drivers enable NVDA to produce speech output through various text-to-speech engines. Each driver implements the interface between NVDA and a specific speech synthesis technology (SAPI5, eSpeak, OneCore, etc.).

Driver Architecture

Base Class

All synthesizer drivers inherit from synthDriverHandler.SynthDriver, which provides:
  • Voice and parameter management
  • Speech command processing
  • Audio output coordination
  • Configuration persistence

Core Responsibilities

  1. Speech Output: Convert text and commands to synthesized speech
  2. Voice Management: Enumerate and switch between available voices
  3. Parameter Control: Handle rate, pitch, volume, and other settings
  4. Event Notifications: Signal when speech starts, ends, or reaches bookmarks

Creating a Basic Driver

Minimal Driver Structure

synthDrivers/silence.py
from synthDriverHandler import SynthDriver
from speech.commands import IndexCommand

class SynthDriver(SynthDriver):
    """A dummy synth driver that produces no speech."""
    
    name = "silence"
    description = "No speech"
    
    # Must support at least IndexCommand
    supportedCommands = {IndexCommand}
    supportedNotifications = {}
    
    @classmethod
    def check(cls):
        """Check if driver can be loaded."""
        return True
    
    def speak(self, speechSequence):
        """Process speech sequence (but produce no audio)."""
        pass
    
    def cancel(self):
        """Stop speech immediately."""
        pass

Required Components

name
str
required
Unique identifier for the driver (should match module filename)
description
str
required
Human-readable name shown in NVDA’s speech settings
supportedCommands
set[type]
required
Speech commands the synth supports. Must include IndexCommand
supportedNotifications
set
required
Notifications the synth provides: synthIndexReached, synthDoneSpeaking

Full SAPI5 Implementation

The SAPI5 driver demonstrates a complete, production-quality implementation:

Driver Initialization

import comtypes.client
from comInterfaces.SpeechLib import ISpVoice
from synthDriverHandler import SynthDriver, VoiceInfo, synthIndexReached, synthDoneSpeaking
from collections import OrderedDict
import nvwave
import audioDucking

class SynthDriver(SynthDriver):
    name = "sapi5"
    description = "Microsoft Speech API version 5"
    
    supportedCommands = {
        IndexCommand,
        CharacterModeCommand,
        LangChangeCommand,
        BreakCommand,
        PitchCommand,
        RateCommand,
        VolumeCommand,
        PhonemeCommand,
    }
    
    supportedNotifications = {synthIndexReached, synthDoneSpeaking}
    
    def __init__(self):
        # Create SAPI voice object
        self._voice = comtypes.client.CreateObject("SAPI.SpVoice")
        self._voiceToken = None
        
        # Set up audio output
        self._initializeAudioOutput()
        
        # Load default voice
        self._voice.Voice = self._getVoiceTokens()[0]
        
        # Enable event notifications
        self._voice.EventInterests = (
            SpeechVoiceEvents.StartInputStream |
            SpeechVoiceEvents.EndInputStream |
            SpeechVoiceEvents.Bookmark
        )

Voice Management

def _getAvailableVoices(self) -> OrderedDict[str, VoiceInfo]:
    """Enumerate available SAPI5 voices."""
    voices = OrderedDict()
    
    for token in self._getVoiceTokens():
        try:
            ID = token.Id
            name = token.GetAttribute("Name")
            language = token.GetAttribute("Language")
            
            # Convert hex language ID to locale name
            try:
                language = languageHandler.windowsLCIDToLocaleName(int(language, 16))
            except ValueError:
                language = None
            
            voices[ID] = VoiceInfo(ID, name, language)
        except COMError:
            log.warning(f"Could not get info for voice {token.Id}")
    
    return voices

def _get_voice(self):
    """Get current voice ID."""
    if self._voiceToken:
        return self._voiceToken.Id
    return ""

def _set_voice(self, id):
    """Set current voice by ID."""
    for token in self._getVoiceTokens():
        if token.Id == id:
            self._voice.Voice = token
            self._voiceToken = token
            break

Speech Parameters

# Define supported settings
supportedSettings = [
    SynthDriver.VoiceSetting(),
    SynthDriver.RateSetting(),
    SynthDriver.PitchSetting(),
    SynthDriver.VolumeSetting(),
]

def _get_rate(self) -> int:
    """Get speech rate (0-100)."""
    # SAPI range: -10 to 10
    return int((self._voice.Rate + 10) * 5)

def _set_rate(self, rate: int):
    """Set speech rate (0-100)."""
    # Convert NVDA range (0-100) to SAPI range (-10 to 10)
    self._voice.Rate = int((rate / 5.0) - 10)

def _get_pitch(self) -> int:
    """Get speech pitch (0-100)."""
    return self._currentPitch

def _set_pitch(self, pitch: int):
    """Set speech pitch (0-100)."""
    self._currentPitch = pitch
    # Pitch is applied per-utterance via XML

def _get_volume(self) -> int:
    """Get speech volume (0-100)."""
    return self._voice.Volume

def _set_volume(self, volume: int):
    """Set speech volume (0-100)."""
    self._voice.Volume = volume

Speech Command Processing

Command Types

from speech.commands import (
    IndexCommand,
    CharacterModeCommand,
    LangChangeCommand,
    BreakCommand,
    PitchCommand,
    RateCommand,
    VolumeCommand,
    PhonemeCommand,
)

def speak(self, speechSequence):
    """Process a speech sequence.
    
    @param speechSequence: List of text strings and SynthCommand objects
    """
    textList = []
    bookmarks = collections.deque()
    
    for item in speechSequence:
        if isinstance(item, str):
            textList.append(item)
        elif isinstance(item, IndexCommand):
            # Add bookmark for index command
            bookmarks.append(item.index)
            textList.append(f'<Bookmark Mark="{item.index}" />')
        elif isinstance(item, BreakCommand):
            textList.append(f'<silence msec="{item.time}" />')
        elif isinstance(item, PitchCommand):
            textList.append(f'<pitch absmiddle="{item.pitch}" />')
        elif isinstance(item, RateCommand):
            textList.append(f'<rate absspeed="{item.rate}" />')
        elif isinstance(item, VolumeCommand):
            textList.append(f'<volume level="{item.volume}" />')
        elif isinstance(item, LangChangeCommand):
            if item.lang:
                lcid = languageHandler.localeNameToWindowsLCID(item.lang)
                textList.append(f'<lang langid="{lcid}" />')
        elif isinstance(item, CharacterModeCommand):
            # Handle character mode
            pass
    
    # Build final XML
    text = "".join(textList)
    xml = f'<pitch absmiddle="{self._currentPitch}">{text}</pitch>'
    
    # Speak with bookmarks
    self._speakRequest(xml, bookmarks)

Bookmark Processing

def _speakRequest(self, xml, bookmarks):
    """Speak XML with bookmark tracking."""
    self._currentBookmarks = bookmarks
    
    # Speak asynchronously
    flags = SpeechVoiceeSpeakFlags.Async | SpeechVoiceSpeakFlags.IsXML
    self._voice.Speak(xml, flags)

def _handleEvents(self):
    """Process SAPI events from the event queue."""
    while True:
        event = self._getNextEvent()
        if not event:
            break
        
        if event.eEventId == _SPEventEnum.TTS_BOOKMARK:
            # Get bookmark ID from event
            bookmarkId = event.lParam
            
            # Find and signal corresponding index
            if self._currentBookmarks:
                index = self._currentBookmarks.popleft()
                synthIndexReached.notify(synth=self, index=index)
        
        elif event.eEventId == _SPEventEnum.END_INPUT_STREAM:
            # Speech finished
            synthDoneSpeaking.notify(synth=self)

Audio Output Management

Custom Audio Stream

For fine-grained control, implement a custom audio stream:
from comtypes import COMObject
from comInterfaces.SpeechLib import ISpAudio

class SynthDriverAudioStream(COMObject):
    """Custom audio stream for speech synthesis."""
    
    _com_interfaces_ = [ISpAudio, ISpEventSource, ISpEventSink]
    
    def __init__(self, synthRef):
        self.synthRef = synthRef
        self.waveFormat = WAVEFORMATEX()
        self._initWaveFormat()
        
    def ISequentialStream_RemoteWrite(self, this, pv, cb, pcbWritten):
        """Called when SAPI wants to write audio data.
        
        @param pv: Pointer to audio data
        @param cb: Number of bytes to write
        @param pcbWritten: Pointer to bytes actually written
        @return: HRESULT
        """
        synth = self.synthRef()
        if not synth:
            return hresult.E_UNEXPECTED
        
        # Process audio through rate/pitch modification
        synth.sonicStream.writeShort(pv, cb // 2)
        audioData = synth.sonicStream.readShort()
        
        # Send to audio player
        synth.player.feed(audioData, len(audioData) * 2)
        
        if pcbWritten:
            pcbWritten[0] = cb
        
        return hresult.S_OK

Audio Ducking

import audioDucking

class SynthDriver(SynthDriver):
    def __init__(self):
        # ...
        # Enable audio ducking (lower other app volumes during speech)
        if config.conf["audio"]["audioDuckingMode"] == audioDucking.AudioDuckingMode.AUDIO_DUCKING:
            audioDucking.initialize()

Advanced Features

Rate Boost with Sonic

Implement rate boost using the Sonic time-stretching library:
from ._sonic import SonicStream

class SynthDriver(SynthDriver):
    def __init__(self):
        # ...
        self.sonicStream = SonicStream(
            sampleRate=22050,
            channels=1,
        )
        self._rateBoost = False
    
    def _set_rateBoost(self, enable: bool):
        """Enable rate boost for faster speech."""
        self._rateBoost = enable
        if enable:
            # Speed up by 1.5x without pitch change
            self.sonicStream.speed = 1.5
        else:
            self.sonicStream.speed = 1.0

Voice Variants

def _getAvailableVariants(self):
    """Get voice variants for current voice."""
    variants = OrderedDict()
    
    # Some synths support voice variants
    # For example, eSpeak supports different voice qualities
    variants["default"] = VoiceInfo("default", _("Default"))
    variants["whisper"] = VoiceInfo("whisper", _("Whisper"))
    variants["robot"] = VoiceInfo("robot", _("Robot"))
    
    return variants

def _get_variant(self):
    return self._currentVariant

def _set_variant(self, id):
    self._currentVariant = id
    # Apply variant settings to synth

Language Support

def _get_language(self) -> str | None:
    """Get language of current voice."""
    return self.availableVoices[self.voice].language

def _set_language(self, language: str):
    """Switch to a voice for the specified language."""
    # Find first voice with matching language
    for voiceId, voice in self.availableVoices.items():
        if voice.language == language:
            self.voice = voiceId
            break

def _get_availableLanguages(self) -> set[str | None]:
    """Get set of all languages available in voices."""
    return {voice.language for voice in self.availableVoices.values()}

Testing Your Driver

Debug Output

from logHandler import log

class SynthDriver(SynthDriver):
    def speak(self, speechSequence):
        log.debug(f"Speaking {len(speechSequence)} items")
        for item in speechSequence:
            log.debug(f"  {type(item).__name__}: {item}")
        # Process speech...

Manual Testing

1

Install the driver

Place your driver in source/synthDrivers/yourdriver.py
2

Restart NVDA

Restart NVDA or reload plugins
3

Select the driver

Open NVDA Settings > Speech and select your synthesizer
4

Test features

  • Verify basic speech output
  • Test rate, pitch, volume controls
  • Test voice switching
  • Verify index markers work
  • Test pausing and canceling

Best Practices

Asynchronous Speech

Always speak asynchronously to avoid blocking NVDA’s main thread

Index Commands

Always support IndexCommand - it’s essential for coordinating speech with braille and other features

Clean Cancellation

Implement cancel() to immediately stop speech without artifacts

Resource Cleanup

Release audio resources and COM objects in terminate()
Use synthDoneSpeaking notification to support features like “say all” that need to know when an utterance completes.

Common Issues

No Speech Output

  • Verify audio output is initialized
  • Check that speak() is actually called
  • Ensure no exceptions are silently caught
  • Test with NVDA’s speech viewer enabled

Choppy or Garbled Speech

  • Check audio buffer sizes
  • Verify sample rate matches synth output
  • Ensure proper synchronization between audio threads

Index Markers Not Working

  • Verify synthIndexReached is in supportedNotifications
  • Ensure bookmarks are correctly parsed from synth events
  • Check that indices match between speak() and notifications

Voice Selection Issues

  • Ensure voice IDs are unique and consistent
  • Verify availableVoices returns correct information
  • Check that voice tokens are properly validated

Reference

Key Modules

  • synthDriverHandler: Base synthesizer driver framework
  • speech.commands: Speech command classes (IndexCommand, etc.)
  • nvwave: Audio output and wave player
  • audioDucking: Audio ducking (reducing other app volumes)
  • languageHandler: Language and locale handling

Speech Commands

CommandPurposeParameters
IndexCommandBookmark for synchronizationindex: int
PitchCommandChange pitchpitch: int (0-100)
RateCommandChange raterate: int (0-100)
VolumeCommandChange volumevolume: int (0-100)
BreakCommandInsert pausetime: int (ms)
CharacterModeCommandEnable/disable character modeenable: bool
LangChangeCommandChange languagelang: str (locale name)
PhonemeCommandSpeak phonemephoneme: str (IPA)

Example Drivers

Study these drivers in source/synthDrivers/ for reference:
  • sapi5.py: Full-featured SAPI5 implementation
  • espeak.py: eSpeak NG integration
  • oneCore.py: Windows OneCore voices
  • silence.py: Minimal driver structure

Developer Guide

See the NVDA Developer Guide for complete plugin development information

Build docs developers (and LLMs) love