Web Speech API Integration
VozCraft leverages the browser’s native Web Speech API to provide high-quality text-to-speech synthesis without requiring external services or API keys. This page documents how the application usesSpeechSynthesis and SpeechSynthesisUtterance to generate natural-sounding voice output.
Overview
The Web Speech API provides two main interfaces for TTS:SpeechSynthesis: Controls speech synthesis and manages the speech queueSpeechSynthesisUtterance: Represents a speech request with configurable properties
The Web Speech API is supported in all modern browsers including Chrome, Firefox, Safari, and Edge. No external dependencies or API keys are required.
Core Implementation
The speak() Function
The heart of VozCraft’s TTS functionality is thespeak() function in App.jsx. This function creates and configures speech utterances with customized voice parameters:
App.jsx (lines 649-685)
Key Components
1. SpeechSynthesisUtterance Properties
VozCraft configures three main utterance properties:lang - Language Code
lang - Language Code
Specifies the BCP 47 language tag for the speech synthesis engine.VozCraft supports 22 languages and regional variants:
- Spanish: es-MX, es-ES, es-AR, es-CO, es-CL, es-VE
- English: en-US, en-GB, en-AU, en-IN
- Portuguese: pt-BR, pt-PT
- And 10+ other languages
pitch - Voice Pitch (0.0 - 2.0)
pitch - Voice Pitch (0.0 - 2.0)
Controls the voice pitch. VozCraft combines gender and mood pitch multipliers:
rate - Speaking Rate (0.1 - 10.0)
rate - Speaking Rate (0.1 - 10.0)
Controls speech speed. VozCraft applies multiple rate modifiers:
volume - Speech Volume (0.0 - 1.0)
volume - Speech Volume (0.0 - 1.0)
Controls playback volume. Certain moods affect volume:
2. Voice Selection Algorithm
VozCraft implements intelligent voice selection based on language and gender preference:The voice selection algorithm searches for gender-specific voice names in both English and Spanish, ensuring proper voice selection across different operating systems.
Speech Control Functions
Starting Speech
ThehandleGenerar function initiates speech synthesis:
App.jsx (lines 692-711)
Stopping Speech
App.jsx (lines 687-690)
Playing from History
VozCraft allows replaying previously generated audio with the same settings:App.jsx (lines 713-719)
Event Handling
The SpeechSynthesisUtterance interface provides lifecycle events:- onstart
- onend
- onerror
Fired when speech begins. VozCraft uses this to:
- Update UI state (
setReproduciendo(true)) - Show visual feedback in the audio player
- Disable the generate button
Browser Compatibility
Checking for Support
Always check for Web Speech API support:Voice Loading
Voices may load asynchronously in some browsers:Browser Support:
- ✅ Chrome 33+ (full support)
- ✅ Firefox 49+ (full support)
- ✅ Safari 7+ (full support)
- ✅ Edge 14+ (full support)
- ✅ Opera 21+ (full support)
Voice Configuration Data
VozCraft defines voice parameters using configuration objects:App.jsx (lines 4-53)
Advanced Features
Duration Estimation
VozCraft estimates audio duration for the progress bar:App.jsx (lines 278-284)
Progress Tracking
The audio player tracks progress using interval-based estimation:App.jsx (lines 286-300)
Best Practices
Always cancel before starting
Call
window.speechSynthesis.cancel() before creating new utterances to prevent queue buildup:Limitations
Performance Considerations
- Memory management: Store only one utterance reference at a time
- Queue management: Cancel previous utterances before starting new ones
- Long text handling: VozCraft limits input to 5000 characters
- Event cleanup: Always clear intervals and listeners in useEffect cleanup
Related Resources
- MDN Web Speech API Documentation
- SpeechSynthesis Interface Reference
- SpeechSynthesisUtterance Interface Reference
Next Steps
Audio Processing
Learn how VozCraft generates downloadable audio files
PWA Setup
Explore the Progressive Web App configuration
