Overview
Speech synthesizer drivers enable NVDA to produce speech output through various text-to-speech engines. Each driver implements the interface between NVDA and a specific speech synthesis technology (SAPI5, eSpeak, OneCore, etc.).
Driver Architecture
Base Class
All synthesizer drivers inherit from synthDriverHandler.SynthDriver, which provides:
Voice and parameter management
Speech command processing
Audio output coordination
Configuration persistence
Core Responsibilities
Speech Output : Convert text and commands to synthesized speech
Voice Management : Enumerate and switch between available voices
Parameter Control : Handle rate, pitch, volume, and other settings
Event Notifications : Signal when speech starts, ends, or reaches bookmarks
Creating a Basic Driver
Minimal Driver Structure
from synthDriverHandler import SynthDriver
from speech.commands import IndexCommand
class SynthDriver ( SynthDriver ):
"""A dummy synth driver that produces no speech."""
name = "silence"
description = "No speech"
# Must support at least IndexCommand
supportedCommands = {IndexCommand}
supportedNotifications = {}
@ classmethod
def check ( cls ):
"""Check if driver can be loaded."""
return True
def speak ( self , speechSequence ):
"""Process speech sequence (but produce no audio)."""
pass
def cancel ( self ):
"""Stop speech immediately."""
pass
Required Components
Unique identifier for the driver (should match module filename)
Human-readable name shown in NVDA’s speech settings
Speech commands the synth supports. Must include IndexCommand
Notifications the synth provides: synthIndexReached, synthDoneSpeaking
Full SAPI5 Implementation
The SAPI5 driver demonstrates a complete, production-quality implementation:
Driver Initialization
import comtypes.client
from comInterfaces.SpeechLib import ISpVoice
from synthDriverHandler import SynthDriver, VoiceInfo, synthIndexReached, synthDoneSpeaking
from collections import OrderedDict
import nvwave
import audioDucking
class SynthDriver ( SynthDriver ):
name = "sapi5"
description = "Microsoft Speech API version 5"
supportedCommands = {
IndexCommand,
CharacterModeCommand,
LangChangeCommand,
BreakCommand,
PitchCommand,
RateCommand,
VolumeCommand,
PhonemeCommand,
}
supportedNotifications = {synthIndexReached, synthDoneSpeaking}
def __init__ ( self ):
# Create SAPI voice object
self ._voice = comtypes.client.CreateObject( "SAPI.SpVoice" )
self ._voiceToken = None
# Set up audio output
self ._initializeAudioOutput()
# Load default voice
self ._voice.Voice = self ._getVoiceTokens()[ 0 ]
# Enable event notifications
self ._voice.EventInterests = (
SpeechVoiceEvents.StartInputStream |
SpeechVoiceEvents.EndInputStream |
SpeechVoiceEvents.Bookmark
)
Voice Management
def _getAvailableVoices ( self ) -> OrderedDict[ str , VoiceInfo]:
"""Enumerate available SAPI5 voices."""
voices = OrderedDict()
for token in self ._getVoiceTokens():
try :
ID = token.Id
name = token.GetAttribute( "Name" )
language = token.GetAttribute( "Language" )
# Convert hex language ID to locale name
try :
language = languageHandler.windowsLCIDToLocaleName( int (language, 16 ))
except ValueError :
language = None
voices[ ID ] = VoiceInfo( ID , name, language)
except COMError:
log.warning( f "Could not get info for voice { token.Id } " )
return voices
def _get_voice ( self ):
"""Get current voice ID."""
if self ._voiceToken:
return self ._voiceToken.Id
return ""
def _set_voice ( self , id ):
"""Set current voice by ID."""
for token in self ._getVoiceTokens():
if token.Id == id :
self ._voice.Voice = token
self ._voiceToken = token
break
Speech Parameters
# Define supported settings
supportedSettings = [
SynthDriver.VoiceSetting(),
SynthDriver.RateSetting(),
SynthDriver.PitchSetting(),
SynthDriver.VolumeSetting(),
]
def _get_rate ( self ) -> int :
"""Get speech rate (0-100)."""
# SAPI range: -10 to 10
return int (( self ._voice.Rate + 10 ) * 5 )
def _set_rate ( self , rate : int ):
"""Set speech rate (0-100)."""
# Convert NVDA range (0-100) to SAPI range (-10 to 10)
self ._voice.Rate = int ((rate / 5.0 ) - 10 )
def _get_pitch ( self ) -> int :
"""Get speech pitch (0-100)."""
return self ._currentPitch
def _set_pitch ( self , pitch : int ):
"""Set speech pitch (0-100)."""
self ._currentPitch = pitch
# Pitch is applied per-utterance via XML
def _get_volume ( self ) -> int :
"""Get speech volume (0-100)."""
return self ._voice.Volume
def _set_volume ( self , volume : int ):
"""Set speech volume (0-100)."""
self ._voice.Volume = volume
Speech Command Processing
Command Types
from speech.commands import (
IndexCommand,
CharacterModeCommand,
LangChangeCommand,
BreakCommand,
PitchCommand,
RateCommand,
VolumeCommand,
PhonemeCommand,
)
def speak ( self , speechSequence ):
"""Process a speech sequence.
@param speechSequence: List of text strings and SynthCommand objects
"""
textList = []
bookmarks = collections.deque()
for item in speechSequence:
if isinstance (item, str ):
textList.append(item)
elif isinstance (item, IndexCommand):
# Add bookmark for index command
bookmarks.append(item.index)
textList.append( f '<Bookmark Mark=" { item.index } " />' )
elif isinstance (item, BreakCommand):
textList.append( f '<silence msec=" { item.time } " />' )
elif isinstance (item, PitchCommand):
textList.append( f '<pitch absmiddle=" { item.pitch } " />' )
elif isinstance (item, RateCommand):
textList.append( f '<rate absspeed=" { item.rate } " />' )
elif isinstance (item, VolumeCommand):
textList.append( f '<volume level=" { item.volume } " />' )
elif isinstance (item, LangChangeCommand):
if item.lang:
lcid = languageHandler.localeNameToWindowsLCID(item.lang)
textList.append( f '<lang langid=" { lcid } " />' )
elif isinstance (item, CharacterModeCommand):
# Handle character mode
pass
# Build final XML
text = "" .join(textList)
xml = f '<pitch absmiddle=" { self ._currentPitch } "> { text } </pitch>'
# Speak with bookmarks
self ._speakRequest(xml, bookmarks)
Bookmark Processing
def _speakRequest ( self , xml , bookmarks ):
"""Speak XML with bookmark tracking."""
self ._currentBookmarks = bookmarks
# Speak asynchronously
flags = SpeechVoiceeSpeakFlags.Async | SpeechVoiceSpeakFlags.IsXML
self ._voice.Speak(xml, flags)
def _handleEvents ( self ):
"""Process SAPI events from the event queue."""
while True :
event = self ._getNextEvent()
if not event:
break
if event.eEventId == _SPEventEnum. TTS_BOOKMARK :
# Get bookmark ID from event
bookmarkId = event.lParam
# Find and signal corresponding index
if self ._currentBookmarks:
index = self ._currentBookmarks.popleft()
synthIndexReached.notify( synth = self , index = index)
elif event.eEventId == _SPEventEnum. END_INPUT_STREAM :
# Speech finished
synthDoneSpeaking.notify( synth = self )
Audio Output Management
Custom Audio Stream
For fine-grained control, implement a custom audio stream:
from comtypes import COMObject
from comInterfaces.SpeechLib import ISpAudio
class SynthDriverAudioStream ( COMObject ):
"""Custom audio stream for speech synthesis."""
_com_interfaces_ = [ISpAudio, ISpEventSource, ISpEventSink]
def __init__ ( self , synthRef ):
self .synthRef = synthRef
self .waveFormat = WAVEFORMATEX()
self ._initWaveFormat()
def ISequentialStream_RemoteWrite ( self , this , pv , cb , pcbWritten ):
"""Called when SAPI wants to write audio data.
@param pv: Pointer to audio data
@param cb: Number of bytes to write
@param pcbWritten: Pointer to bytes actually written
@return: HRESULT
"""
synth = self .synthRef()
if not synth:
return hresult. E_UNEXPECTED
# Process audio through rate/pitch modification
synth.sonicStream.writeShort(pv, cb // 2 )
audioData = synth.sonicStream.readShort()
# Send to audio player
synth.player.feed(audioData, len (audioData) * 2 )
if pcbWritten:
pcbWritten[ 0 ] = cb
return hresult. S_OK
Audio Ducking
import audioDucking
class SynthDriver ( SynthDriver ):
def __init__ ( self ):
# ...
# Enable audio ducking (lower other app volumes during speech)
if config.conf[ "audio" ][ "audioDuckingMode" ] == audioDucking.AudioDuckingMode. AUDIO_DUCKING :
audioDucking.initialize()
Advanced Features
Rate Boost with Sonic
Implement rate boost using the Sonic time-stretching library:
from ._sonic import SonicStream
class SynthDriver ( SynthDriver ):
def __init__ ( self ):
# ...
self .sonicStream = SonicStream(
sampleRate = 22050 ,
channels = 1 ,
)
self ._rateBoost = False
def _set_rateBoost ( self , enable : bool ):
"""Enable rate boost for faster speech."""
self ._rateBoost = enable
if enable:
# Speed up by 1.5x without pitch change
self .sonicStream.speed = 1.5
else :
self .sonicStream.speed = 1.0
Voice Variants
def _getAvailableVariants ( self ):
"""Get voice variants for current voice."""
variants = OrderedDict()
# Some synths support voice variants
# For example, eSpeak supports different voice qualities
variants[ "default" ] = VoiceInfo( "default" , _( "Default" ))
variants[ "whisper" ] = VoiceInfo( "whisper" , _( "Whisper" ))
variants[ "robot" ] = VoiceInfo( "robot" , _( "Robot" ))
return variants
def _get_variant ( self ):
return self ._currentVariant
def _set_variant ( self , id ):
self ._currentVariant = id
# Apply variant settings to synth
Language Support
def _get_language ( self ) -> str | None :
"""Get language of current voice."""
return self .availableVoices[ self .voice].language
def _set_language ( self , language : str ):
"""Switch to a voice for the specified language."""
# Find first voice with matching language
for voiceId, voice in self .availableVoices.items():
if voice.language == language:
self .voice = voiceId
break
def _get_availableLanguages ( self ) -> set[ str | None ]:
"""Get set of all languages available in voices."""
return {voice.language for voice in self .availableVoices.values()}
Testing Your Driver
Debug Output
from logHandler import log
class SynthDriver ( SynthDriver ):
def speak ( self , speechSequence ):
log.debug( f "Speaking { len (speechSequence) } items" )
for item in speechSequence:
log.debug( f " { type (item). __name__ } : { item } " )
# Process speech...
Manual Testing
Install the driver
Place your driver in source/synthDrivers/yourdriver.py
Restart NVDA
Restart NVDA or reload plugins
Select the driver
Open NVDA Settings > Speech and select your synthesizer
Test features
Verify basic speech output
Test rate, pitch, volume controls
Test voice switching
Verify index markers work
Test pausing and canceling
Best Practices
Asynchronous Speech Always speak asynchronously to avoid blocking NVDA’s main thread
Index Commands Always support IndexCommand - it’s essential for coordinating speech with braille and other features
Clean Cancellation Implement cancel() to immediately stop speech without artifacts
Resource Cleanup Release audio resources and COM objects in terminate()
Use synthDoneSpeaking notification to support features like “say all” that need to know when an utterance completes.
Common Issues
No Speech Output
Verify audio output is initialized
Check that speak() is actually called
Ensure no exceptions are silently caught
Test with NVDA’s speech viewer enabled
Choppy or Garbled Speech
Check audio buffer sizes
Verify sample rate matches synth output
Ensure proper synchronization between audio threads
Index Markers Not Working
Verify synthIndexReached is in supportedNotifications
Ensure bookmarks are correctly parsed from synth events
Check that indices match between speak() and notifications
Voice Selection Issues
Ensure voice IDs are unique and consistent
Verify availableVoices returns correct information
Check that voice tokens are properly validated
Reference
Key Modules
synthDriverHandler: Base synthesizer driver framework
speech.commands: Speech command classes (IndexCommand, etc.)
nvwave: Audio output and wave player
audioDucking: Audio ducking (reducing other app volumes)
languageHandler: Language and locale handling
Speech Commands
Command Purpose Parameters IndexCommandBookmark for synchronization index: intPitchCommandChange pitch pitch: int (0-100)RateCommandChange rate rate: int (0-100)VolumeCommandChange volume volume: int (0-100)BreakCommandInsert pause time: int (ms)CharacterModeCommandEnable/disable character mode enable: boolLangChangeCommandChange language lang: str (locale name)PhonemeCommandSpeak phoneme phoneme: str (IPA)
Example Drivers
Study these drivers in source/synthDrivers/ for reference:
sapi5.py: Full-featured SAPI5 implementation
espeak.py: eSpeak NG integration
oneCore.py: Windows OneCore voices
silence.py: Minimal driver structure
Developer Guide See the NVDA Developer Guide for complete plugin development information