Speech Synthesizer Drivers

Overview

Speech synthesizer drivers enable NVDA to produce speech output through various text-to-speech engines. Each driver implements the interface between NVDA and a specific speech synthesis technology (SAPI5, eSpeak, OneCore, etc.).

Driver Architecture

Base Class

All synthesizer drivers inherit from synthDriverHandler.SynthDriver, which provides:

Voice and parameter management
Speech command processing
Audio output coordination
Configuration persistence

Core Responsibilities

Speech Output: Convert text and commands to synthesized speech
Voice Management: Enumerate and switch between available voices
Parameter Control: Handle rate, pitch, volume, and other settings
Event Notifications: Signal when speech starts, ends, or reaches bookmarks

Creating a Basic Driver

Minimal Driver Structure

synthDrivers/silence.py

from synthDriverHandler import SynthDriver
from speech.commands import IndexCommand

class SynthDriver(SynthDriver):
    """A dummy synth driver that produces no speech."""
    
    name = "silence"
    description = "No speech"
    
    # Must support at least IndexCommand
    supportedCommands = {IndexCommand}
    supportedNotifications = {}
    
    @classmethod
    def check(cls):
        """Check if driver can be loaded."""
        return True
    
    def speak(self, speechSequence):
        """Process speech sequence (but produce no audio)."""
        pass
    
    def cancel(self):
        """Stop speech immediately."""
        pass

Required Components

name

str

required

Unique identifier for the driver (should match module filename)

description

str

required

Human-readable name shown in NVDA’s speech settings

supportedCommands

set[type]

required

Speech commands the synth supports. Must include IndexCommand

supportedNotifications

set

required

Notifications the synth provides: synthIndexReached, synthDoneSpeaking

Full SAPI5 Implementation

The SAPI5 driver demonstrates a complete, production-quality implementation:

Driver Initialization

import comtypes.client
from comInterfaces.SpeechLib import ISpVoice
from synthDriverHandler import SynthDriver, VoiceInfo, synthIndexReached, synthDoneSpeaking
from collections import OrderedDict
import nvwave
import audioDucking

class SynthDriver(SynthDriver):
    name = "sapi5"
    description = "Microsoft Speech API version 5"
    
    supportedCommands = {
        IndexCommand,
        CharacterModeCommand,
        LangChangeCommand,
        BreakCommand,
        PitchCommand,
        RateCommand,
        VolumeCommand,
        PhonemeCommand,
    }
    
    supportedNotifications = {synthIndexReached, synthDoneSpeaking}
    
    def __init__(self):
        # Create SAPI voice object
        self._voice = comtypes.client.CreateObject("SAPI.SpVoice")
        self._voiceToken = None
        
        # Set up audio output
        self._initializeAudioOutput()
        
        # Load default voice
        self._voice.Voice = self._getVoiceTokens()[0]
        
        # Enable event notifications
        self._voice.EventInterests = (
            SpeechVoiceEvents.StartInputStream |
            SpeechVoiceEvents.EndInputStream |
            SpeechVoiceEvents.Bookmark
        )

Voice Management

def _getAvailableVoices(self) -> OrderedDict[str, VoiceInfo]:
    """Enumerate available SAPI5 voices."""
    voices = OrderedDict()
    
    for token in self._getVoiceTokens():
        try:
            ID = token.Id
            name = token.GetAttribute("Name")
            language = token.GetAttribute("Language")
            
            # Convert hex language ID to locale name
            try:
                language = languageHandler.windowsLCIDToLocaleName(int(language, 16))
            except ValueError:
                language = None
            
            voices[ID] = VoiceInfo(ID, name, language)
        except COMError:
            log.warning(f"Could not get info for voice {token.Id}")
    
    return voices

def _get_voice(self):
    """Get current voice ID."""
    if self._voiceToken:
        return self._voiceToken.Id
    return ""

def _set_voice(self, id):
    """Set current voice by ID."""
    for token in self._getVoiceTokens():
        if token.Id == id:
            self._voice.Voice = token
            self._voiceToken = token
            break

Speech Parameters

# Define supported settings
supportedSettings = [
    SynthDriver.VoiceSetting(),
    SynthDriver.RateSetting(),
    SynthDriver.PitchSetting(),
    SynthDriver.VolumeSetting(),
]

def _get_rate(self) -> int:
    """Get speech rate (0-100)."""
    # SAPI range: -10 to 10
    return int((self._voice.Rate + 10) * 5)

def _set_rate(self, rate: int):
    """Set speech rate (0-100)."""
    # Convert NVDA range (0-100) to SAPI range (-10 to 10)
    self._voice.Rate = int((rate / 5.0) - 10)

def _get_pitch(self) -> int:
    """Get speech pitch (0-100)."""
    return self._currentPitch

def _set_pitch(self, pitch: int):
    """Set speech pitch (0-100)."""
    self._currentPitch = pitch
    # Pitch is applied per-utterance via XML

def _get_volume(self) -> int:
    """Get speech volume (0-100)."""
    return self._voice.Volume

def _set_volume(self, volume: int):
    """Set speech volume (0-100)."""
    self._voice.Volume = volume

Speech Command Processing

Command Types

from speech.commands import (
    IndexCommand,
    CharacterModeCommand,
    LangChangeCommand,
    BreakCommand,
    PitchCommand,
    RateCommand,
    VolumeCommand,
    PhonemeCommand,
)

def speak(self, speechSequence):
    """Process a speech sequence.
    
    @param speechSequence: List of text strings and SynthCommand objects
    """
    textList = []
    bookmarks = collections.deque()
    
    for item in speechSequence:
        if isinstance(item, str):
            textList.append(item)
        elif isinstance(item, IndexCommand):
            # Add bookmark for index command
            bookmarks.append(item.index)
            textList.append(f'<Bookmark Mark="{item.index}" />')
        elif isinstance(item, BreakCommand):
            textList.append(f'<silence msec="{item.time}" />')
        elif isinstance(item, PitchCommand):
            textList.append(f'<pitch absmiddle="{item.pitch}" />')
        elif isinstance(item, RateCommand):
            textList.append(f'<rate absspeed="{item.rate}" />')
        elif isinstance(item, VolumeCommand):
            textList.append(f'<volume level="{item.volume}" />')
        elif isinstance(item, LangChangeCommand):
            if item.lang:
                lcid = languageHandler.localeNameToWindowsLCID(item.lang)
                textList.append(f'<lang langid="{lcid}" />')
        elif isinstance(item, CharacterModeCommand):
            # Handle character mode
            pass
    
    # Build final XML
    text = "".join(textList)
    xml = f'<pitch absmiddle="{self._currentPitch}">{text}</pitch>'
    
    # Speak with bookmarks
    self._speakRequest(xml, bookmarks)

Bookmark Processing

def _speakRequest(self, xml, bookmarks):
    """Speak XML with bookmark tracking."""
    self._currentBookmarks = bookmarks
    
    # Speak asynchronously
    flags = SpeechVoiceeSpeakFlags.Async | SpeechVoiceSpeakFlags.IsXML
    self._voice.Speak(xml, flags)

def _handleEvents(self):
    """Process SAPI events from the event queue."""
    while True:
        event = self._getNextEvent()
        if not event:
            break
        
        if event.eEventId == _SPEventEnum.TTS_BOOKMARK:
            # Get bookmark ID from event
            bookmarkId = event.lParam
            
            # Find and signal corresponding index
            if self._currentBookmarks:
                index = self._currentBookmarks.popleft()
                synthIndexReached.notify(synth=self, index=index)
        
        elif event.eEventId == _SPEventEnum.END_INPUT_STREAM:
            # Speech finished
            synthDoneSpeaking.notify(synth=self)

Audio Output Management

Custom Audio Stream

For fine-grained control, implement a custom audio stream:

from comtypes import COMObject
from comInterfaces.SpeechLib import ISpAudio

class SynthDriverAudioStream(COMObject):
    """Custom audio stream for speech synthesis."""
    
    _com_interfaces_ = [ISpAudio, ISpEventSource, ISpEventSink]
    
    def __init__(self, synthRef):
        self.synthRef = synthRef
        self.waveFormat = WAVEFORMATEX()
        self._initWaveFormat()
        
    def ISequentialStream_RemoteWrite(self, this, pv, cb, pcbWritten):
        """Called when SAPI wants to write audio data.
        
        @param pv: Pointer to audio data
        @param cb: Number of bytes to write
        @param pcbWritten: Pointer to bytes actually written
        @return: HRESULT
        """
        synth = self.synthRef()
        if not synth:
            return hresult.E_UNEXPECTED
        
        # Process audio through rate/pitch modification
        synth.sonicStream.writeShort(pv, cb // 2)
        audioData = synth.sonicStream.readShort()
        
        # Send to audio player
        synth.player.feed(audioData, len(audioData) * 2)
        
        if pcbWritten:
            pcbWritten[0] = cb
        
        return hresult.S_OK

Audio Ducking

import audioDucking

class SynthDriver(SynthDriver):
    def __init__(self):
        # ...
        # Enable audio ducking (lower other app volumes during speech)
        if config.conf["audio"]["audioDuckingMode"] == audioDucking.AudioDuckingMode.AUDIO_DUCKING:
            audioDucking.initialize()

Advanced Features

Rate Boost with Sonic

Implement rate boost using the Sonic time-stretching library:

from ._sonic import SonicStream

class SynthDriver(SynthDriver):
    def __init__(self):
        # ...
        self.sonicStream = SonicStream(
            sampleRate=22050,
            channels=1,
        )
        self._rateBoost = False
    
    def _set_rateBoost(self, enable: bool):
        """Enable rate boost for faster speech."""
        self._rateBoost = enable
        if enable:
            # Speed up by 1.5x without pitch change
            self.sonicStream.speed = 1.5
        else:
            self.sonicStream.speed = 1.0

Voice Variants

def _getAvailableVariants(self):
    """Get voice variants for current voice."""
    variants = OrderedDict()
    
    # Some synths support voice variants
    # For example, eSpeak supports different voice qualities
    variants["default"] = VoiceInfo("default", _("Default"))
    variants["whisper"] = VoiceInfo("whisper", _("Whisper"))
    variants["robot"] = VoiceInfo("robot", _("Robot"))
    
    return variants

def _get_variant(self):
    return self._currentVariant

def _set_variant(self, id):
    self._currentVariant = id
    # Apply variant settings to synth

Language Support

def _get_language(self) -> str | None:
    """Get language of current voice."""
    return self.availableVoices[self.voice].language

def _set_language(self, language: str):
    """Switch to a voice for the specified language."""
    # Find first voice with matching language
    for voiceId, voice in self.availableVoices.items():
        if voice.language == language:
            self.voice = voiceId
            break

def _get_availableLanguages(self) -> set[str | None]:
    """Get set of all languages available in voices."""
    return {voice.language for voice in self.availableVoices.values()}

Testing Your Driver

Debug Output

from logHandler import log

class SynthDriver(SynthDriver):
    def speak(self, speechSequence):
        log.debug(f"Speaking {len(speechSequence)} items")
        for item in speechSequence:
            log.debug(f"  {type(item).__name__}: {item}")
        # Process speech...

Manual Testing

Install the driver

Place your driver in source/synthDrivers/yourdriver.py

Restart NVDA

Restart NVDA or reload plugins

Select the driver

Open NVDA Settings > Speech and select your synthesizer

Test features

Verify basic speech output
Test rate, pitch, volume controls
Test voice switching
Verify index markers work
Test pausing and canceling

Best Practices

Asynchronous Speech

Always speak asynchronously to avoid blocking NVDA’s main thread

Index Commands

Always support IndexCommand - it’s essential for coordinating speech with braille and other features

Clean Cancellation

Implement cancel() to immediately stop speech without artifacts

Resource Cleanup

Release audio resources and COM objects in terminate()

Use synthDoneSpeaking notification to support features like “say all” that need to know when an utterance completes.

Common Issues

No Speech Output

Verify audio output is initialized
Check that speak() is actually called
Ensure no exceptions are silently caught
Test with NVDA’s speech viewer enabled

Choppy or Garbled Speech

Check audio buffer sizes
Verify sample rate matches synth output
Ensure proper synchronization between audio threads

Index Markers Not Working

Verify synthIndexReached is in supportedNotifications
Ensure bookmarks are correctly parsed from synth events
Check that indices match between speak() and notifications

Voice Selection Issues

Ensure voice IDs are unique and consistent
Verify availableVoices returns correct information
Check that voice tokens are properly validated

Reference

Key Modules

synthDriverHandler: Base synthesizer driver framework
speech.commands: Speech command classes (IndexCommand, etc.)
nvwave: Audio output and wave player
audioDucking: Audio ducking (reducing other app volumes)
languageHandler: Language and locale handling

Speech Commands

Command	Purpose	Parameters
`IndexCommand`	Bookmark for synchronization	`index: int`
`PitchCommand`	Change pitch	`pitch: int` (0-100)
`RateCommand`	Change rate	`rate: int` (0-100)
`VolumeCommand`	Change volume	`volume: int` (0-100)
`BreakCommand`	Insert pause	`time: int` (ms)
`CharacterModeCommand`	Enable/disable character mode	`enable: bool`
`LangChangeCommand`	Change language	`lang: str` (locale name)
`PhonemeCommand`	Speak phoneme	`phoneme: str` (IPA)

Example Drivers

Study these drivers in source/synthDrivers/ for reference:

sapi5.py: Full-featured SAPI5 implementation
espeak.py: eSpeak NG integration
oneCore.py: Windows OneCore voices
silence.py: Minimal driver structure

Developer Guide

See the NVDA Developer Guide for complete plugin development information

Getting Started

Contributing

Architecture

Add-on Development

Advanced Topics

​Overview

​Driver Architecture

​Base Class

​Core Responsibilities

​Creating a Basic Driver

​Minimal Driver Structure

​Required Components

​Full SAPI5 Implementation

​Driver Initialization

​Voice Management

​Speech Parameters

​Speech Command Processing

​Command Types

​Bookmark Processing

​Audio Output Management

​Custom Audio Stream

​Audio Ducking

​Advanced Features

​Rate Boost with Sonic

​Voice Variants

​Language Support

​Testing Your Driver

​Debug Output

​Manual Testing

​Best Practices

Asynchronous Speech

Index Commands

Clean Cancellation

Resource Cleanup

​Common Issues

​No Speech Output

​Choppy or Garbled Speech

​Index Markers Not Working

​Voice Selection Issues

​Reference

​Key Modules

​Speech Commands

​Example Drivers

Developer Guide

Build docs developers (and LLMs) love

Overview

Driver Architecture

Base Class

Core Responsibilities

Creating a Basic Driver

Minimal Driver Structure

Required Components

Full SAPI5 Implementation

Driver Initialization

Voice Management

Speech Parameters

Speech Command Processing

Command Types

Bookmark Processing

Audio Output Management

Custom Audio Stream

Audio Ducking

Advanced Features

Rate Boost with Sonic

Voice Variants

Language Support

Testing Your Driver

Debug Output

Manual Testing

Best Practices

Common Issues

No Speech Output

Choppy or Garbled Speech

Index Markers Not Working

Voice Selection Issues

Reference

Key Modules

Speech Commands

Example Drivers