Web Speech API Integration

VozCraft leverages the browser’s native Web Speech API to provide high-quality text-to-speech synthesis without requiring external services or API keys. This page documents how the application uses SpeechSynthesis and SpeechSynthesisUtterance to generate natural-sounding voice output.

Overview

The Web Speech API provides two main interfaces for TTS:

SpeechSynthesis: Controls speech synthesis and manages the speech queue
SpeechSynthesisUtterance: Represents a speech request with configurable properties

The Web Speech API is supported in all modern browsers including Chrome, Firefox, Safari, and Edge. No external dependencies or API keys are required.

Core Implementation

The speak() Function

The heart of VozCraft’s TTS functionality is the speak() function in App.jsx. This function creates and configures speech utterances with customized voice parameters:

App.jsx (lines 649-685)

const speak = useCallback((txt, vozLabel, generoLabel, velLabel, animLabel, onEnd) => {
  window.speechSynthesis.cancel();
  const u = new SpeechSynthesisUtterance(txt);
  const vd   = VOCES.find(v => v.label === vozLabel) || VOCES[0];
  const gd   = GENEROS.find(g => g.label === generoLabel) || GENEROS[0];
  const veld = VELOCIDADES.find(v => v.label === velLabel) || VELOCIDADES[2];
  const ad   = ANIMOS.find(a => a.label === animLabel) || ANIMOS[0];

  u.lang   = vd.lang;
  u.pitch  = Math.max(0.1, Math.min(2, gd.pitch * ad.pitch));
  u.rate   = Math.max(0.1, Math.min(10, (veld.rate + gd.rateAdd) * ad.rateMulti));
  u.volume = Math.max(0, Math.min(1, ad.volume));

  // Voice selection logic based on gender
  const loadedVoices = window.speechSynthesis.getVoices();
  const voicesForLang = loadedVoices.filter(v =>
    v.lang === vd.lang || v.lang.startsWith(vd.lang.split('-')[0])
  );
  const wantFemale = generoLabel === 'Voz Aguda';
  const gendered = voicesForLang.find(v => {
    const n = v.name.toLowerCase();
    return wantFemale
      ? n.includes('female') || n.includes('woman') || n.includes('paulina') || n.includes('mónica')
      : n.includes('male') || n.includes('man') || n.includes('jorge') || n.includes('carlos');
  });
  const fallback = voicesForLang[0];
  if (gendered) u.voice = gendered;
  else if (fallback) u.voice = fallback;

  u.onstart = () => setReproduciendo(true);
  u.onend   = () => { setReproduciendo(false); setPlayingId(null); if (onEnd) onEnd(); };
  u.onerror = () => { setReproduciendo(false); setPlayingId(null); };

  uttRef.current = u;
  window.speechSynthesis.speak(u);
}, []);

Key Components

1. SpeechSynthesisUtterance Properties

VozCraft configures three main utterance properties:

lang - Language Code

Specifies the BCP 47 language tag for the speech synthesis engine.

u.lang = 'es-MX'; // Spanish (Mexico)
u.lang = 'en-US'; // English (US)
u.lang = 'pt-BR'; // Portuguese (Brazil)

VozCraft supports 22 languages and regional variants:

Spanish: es-MX, es-ES, es-AR, es-CO, es-CL, es-VE
English: en-US, en-GB, en-AU, en-IN
Portuguese: pt-BR, pt-PT
And 10+ other languages

pitch - Voice Pitch (0.0 - 2.0)

Controls the voice pitch. VozCraft combines gender and mood pitch multipliers:

// Gender pitch values
const GENEROS = [
  { label: 'Voz Normal', pitch: 0.75, rateAdd: -0.05 },
  { label: 'Voz Aguda',  pitch: 1.30, rateAdd:  0.05 },
];

// Mood pitch values
const ANIMOS = [
  { label: 'Neutral',    pitch: 1.00 },
  { label: 'Alegre',     pitch: 1.25 },
  { label: 'Serio',      pitch: 0.80 },
  { label: 'Entusiasta', pitch: 1.35 },
  { label: 'Melancólico',pitch: 0.70 },
];

// Final pitch calculation
u.pitch = Math.max(0.1, Math.min(2, gd.pitch * ad.pitch));
// Example: Normal (0.75) × Alegre (1.25) = 0.9375

The pitch value is clamped between 0.1 and 2.0 to prevent extreme distortion.

rate - Speaking Rate (0.1 - 10.0)

Controls speech speed. VozCraft applies multiple rate modifiers:

// Speed presets
const VELOCIDADES = [
  { label: 'Muy Lento',  rate: 0.50 },
  { label: 'Lento',      rate: 0.75 },
  { label: 'Normal',     rate: 1.00 },
  { label: 'Rápido',     rate: 1.25 },
  { label: 'Muy Rápido', rate: 1.60 },
];

// Rate calculation with gender and mood modifiers
const effectiveRate = (veld.rate + gd.rateAdd) * ad.rateMulti;
u.rate = Math.max(0.1, Math.min(10, effectiveRate));

// Example: 
// Normal speed (1.00) + Voz Aguda (+0.05) × Enérgico (1.30) = 1.365

volume - Speech Volume (0.0 - 1.0)

Controls playback volume. Certain moods affect volume:

const ANIMOS = [
  { label: 'Neutral',     volume: 1.00 },
  { label: 'Serio',       volume: 0.95 },
  { label: 'Melancólico', volume: 0.88 },
  { label: 'Relajado',    volume: 0.90 },
];

u.volume = Math.max(0, Math.min(1, ad.volume));

2. Voice Selection Algorithm

VozCraft implements intelligent voice selection based on language and gender preference:

// Get all available system voices
const loadedVoices = window.speechSynthesis.getVoices();

// Filter by language
const voicesForLang = loadedVoices.filter(v =>
  v.lang === vd.lang || v.lang.startsWith(vd.lang.split('-')[0])
);

// Gender-based selection
const wantFemale = generoLabel === 'Voz Aguda';
const gendered = voicesForLang.find(v => {
  const n = v.name.toLowerCase();
  return wantFemale
    ? n.includes('female') || n.includes('woman') || n.includes('girl') ||
      n.includes('paulina') || n.includes('mónica') || n.includes('lucia') ||
      n.includes('samantha') || n.includes('karen')
    : n.includes('male') || n.includes('man') || n.includes('guy') ||
      n.includes('jorge') || n.includes('carlos') || n.includes('diego') ||
      n.includes('alex') || n.includes('daniel') || n.includes('thomas');
});

// Fallback hierarchy
if (gendered) u.voice = gendered;
else if (fallback) u.voice = fallback;

The voice selection algorithm searches for gender-specific voice names in both English and Spanish, ensuring proper voice selection across different operating systems.

Speech Control Functions

Starting Speech

The handleGenerar function initiates speech synthesis:

App.jsx (lines 692-711)

const handleGenerar = async () => {
  if (!texto.trim()) {
    showToast(language === 'es' ? 'Por favor escribe algún texto' : 'Please enter some text', 'error');
    return;
  }
  if (generando) {
    stopSpeech();
    return;
  }

  setGenerando(true);
  const item = {
    id: Date.now().toString(),
    timestamp: Date.now(),
    texto: texto.trim(),
    nombre: '',
    voz, genero, velocidad, animo,
  };

  await new Promise(r => setTimeout(r, 400));
  speak(texto, voz, genero, velocidad, animo, () => setGenerando(false));

  const newHistory = [item, ...history].slice(0, 30);
  setHistory(newHistory);
  saveHistory(newHistory);
  showToast('✓ Audio generado correctamente');
};

Stopping Speech

App.jsx (lines 687-690)

const stopSpeech = useCallback(() => {
  window.speechSynthesis.cancel();
  setReproduciendo(false);
  setPlayingId(null);
  setGenerando(false);
}, []);

window.speechSynthesis.cancel() immediately stops all speech and clears the speech queue. Any pending utterances are discarded.

Playing from History

VozCraft allows replaying previously generated audio with the same settings:

App.jsx (lines 713-719)

const handlePlayHistory = useCallback((item, customText) => {
  if (playingId === item.id && !customText) {
    stopSpeech();
  } else {
    stopSpeech();
    setPlayingId(item.id);
    speak(customText || item.texto, item.voz, item.genero, item.velocidad, item.animo, () => setPlayingId(null));
  }
}, [playingId, stopSpeech, speak]);

Event Handling

The SpeechSynthesisUtterance interface provides lifecycle events:

u.onstart = () => setReproduciendo(true);
u.onend   = () => {
  setReproduciendo(false);
  setPlayingId(null);
  if (onEnd) onEnd();
};
u.onerror = () => {
  setReproduciendo(false);
  setPlayingId(null);
};

onstart
onend
onerror

Fired when speech begins. VozCraft uses this to:

Update UI state (setReproduciendo(true))
Show visual feedback in the audio player
Disable the generate button

Browser Compatibility

Checking for Support

Always check for Web Speech API support:

if ('speechSynthesis' in window) {
  // Speech synthesis is supported
  const synth = window.speechSynthesis;
  const voices = synth.getVoices();
} else {
  console.error('Web Speech API not supported');
}

Voice Loading

Voices may load asynchronously in some browsers:

window.speechSynthesis.addEventListener('voiceschanged', () => {
  const voices = window.speechSynthesis.getVoices();
  console.log(`Loaded ${voices.length} voices`);
});

Browser Support:

✅ Chrome 33+ (full support)
✅ Firefox 49+ (full support)
✅ Safari 7+ (full support)
✅ Edge 14+ (full support)
✅ Opera 21+ (full support)

Voice Configuration Data

VozCraft defines voice parameters using configuration objects:

App.jsx (lines 4-53)

// Gender presets
const GENEROS = [
  { label: 'Voz Normal', labelEn: 'Normal Voice', pitch: 0.75, rateAdd: -0.05, emoji: '🔉' },
  { label: 'Voz Aguda',  labelEn: 'High-pitched Voice', pitch: 1.30, rateAdd: 0.05, emoji: '🔊' },
];

// Language options (22 variants)
const VOCES = [
  { label: 'Español (México)', labelEn: 'Spanish (Mexico)', lang: 'es-MX', flag: '🇲🇽', group: 'es' },
  { label: 'English (US)', labelEn: 'English (US)', lang: 'en-US', flag: '🇺🇸', group: 'en' },
  // ... 20 more languages
];

// Mood presets (8 options)
const ANIMOS = [
  { label: 'Neutral', pitch: 1.00, rateMulti: 1.00, volume: 1.00, emoji: '😐' },
  { label: 'Alegre', pitch: 1.25, rateMulti: 1.15, volume: 1.00, emoji: '😄' },
  { label: 'Serio', pitch: 0.80, rateMulti: 0.88, volume: 0.95, emoji: '😠' },
  { label: 'Entusiasta', pitch: 1.35, rateMulti: 1.25, volume: 1.00, emoji: '🤩' },
  // ... 4 more moods
];

// Speed presets
const VELOCIDADES = [
  { label: 'Muy Lento', labelEn: 'Very Slow', rate: 0.50 },
  { label: 'Lento', labelEn: 'Slow', rate: 0.75 },
  { label: 'Normal', labelEn: 'Normal', rate: 1.00 },
  { label: 'Rápido', labelEn: 'Fast', rate: 1.25 },
  { label: 'Muy Rápido', labelEn: 'Very Fast', rate: 1.60 },
];

Advanced Features

Duration Estimation

VozCraft estimates audio duration for the progress bar:

App.jsx (lines 278-284)

const getEstimatedDuration = useCallback(() => {
  const velData = VELOCIDADES.find(v => v.label === item.velocidad) || VELOCIDADES[2];
  const animData = ANIMOS.find(a => a.label === item.animo) || ANIMOS[0];
  const gd = GENEROS.find(g => g.label === item.genero) || GENEROS[0];
  const effectiveRate = (velData.rate + gd.rateAdd) * animData.rateMulti;
  return Math.max(1, item.texto.length / (14 * effectiveRate));
}, [item]);

The formula texto.length / (14 * effectiveRate) assumes approximately 14 characters per second at normal speed, adjusted by the effective rate.

Progress Tracking

The audio player tracks progress using interval-based estimation:

App.jsx (lines 286-300)

useEffect(() => {
  if (isPlaying) {
    startTimeRef.current = Date.now() - (currentTime * 1000);
    intervalRef.current = setInterval(() => {
      const elapsed = (Date.now() - startTimeRef.current) / 1000;
      const dur = getEstimatedDuration();
      setCurrentTime(Math.min(elapsed, dur));
      setProgress(Math.min(100, (elapsed / dur) * 100));
      if (elapsed >= dur) clearInterval(intervalRef.current);
    }, 100);
  } else {
    clearInterval(intervalRef.current);
  }
  return () => clearInterval(intervalRef.current);
}, [isPlaying, getEstimatedDuration]);

Best Practices

Always cancel before starting

Call window.speechSynthesis.cancel() before creating new utterances to prevent queue buildup:

window.speechSynthesis.cancel();
const utterance = new SpeechSynthesisUtterance(text);

Clamp parameter values

Always validate and clamp pitch, rate, and volume to valid ranges:

u.pitch = Math.max(0.1, Math.min(2, calculatedPitch));
u.rate = Math.max(0.1, Math.min(10, calculatedRate));
u.volume = Math.max(0, Math.min(1, calculatedVolume));

Handle voice loading

Wait for voices to load before attempting synthesis:

const loadVoices = () => {
  return new Promise((resolve) => {
    const voices = window.speechSynthesis.getVoices();
    if (voices.length) {
      resolve(voices);
    } else {
      window.speechSynthesis.addEventListener('voiceschanged', () => {
        resolve(window.speechSynthesis.getVoices());
      });
    }
  });
};

Implement error handling

Always provide onerror handlers to gracefully handle synthesis failures:

utterance.onerror = (event) => {
  console.error('Speech synthesis error:', event.error);
  // Reset UI state
  setIsPlaying(false);
};

Limitations

Known limitations of the Web Speech API:

Voice availability varies by OS: Windows, macOS, iOS, and Android have different voice libraries
No fine-grained pause control: Cannot pause/resume mid-utterance reliably
No precise progress events: Must estimate duration and progress
Queue interruption: New utterances cancel previous ones when using cancel()
Character limits: Some browsers impose limits on utterance length (typically 4000-5000 chars)

Performance Considerations

Memory management: Store only one utterance reference at a time
Queue management: Cancel previous utterances before starting new ones
Long text handling: VozCraft limits input to 5000 characters
Event cleanup: Always clear intervals and listeners in useEffect cleanup

Next Steps

Audio Processing

Learn how VozCraft generates downloadable audio files

PWA Setup

Explore the Progressive Web App configuration

Architecture

Development

Web Speech API Integration

Web Speech API Integration

Overview

Core Implementation

The speak() Function

Key Components

1. SpeechSynthesisUtterance Properties

2. Voice Selection Algorithm

Speech Control Functions

Starting Speech

Stopping Speech

Playing from History

Event Handling

Browser Compatibility

Checking for Support

Voice Loading

Voice Configuration Data

Advanced Features

Duration Estimation

Progress Tracking

Best Practices

Limitations

Performance Considerations

Next Steps

Audio Processing

PWA Setup

Build docs developers (and LLMs) love

Architecture

Development

​Web Speech API Integration

​Overview

​Core Implementation

​The speak() Function

​Key Components

​1. SpeechSynthesisUtterance Properties

​2. Voice Selection Algorithm

​Speech Control Functions

​Starting Speech

​Stopping Speech

​Playing from History

​Event Handling

​Browser Compatibility

​Checking for Support

​Voice Loading

​Voice Configuration Data

​Advanced Features

​Duration Estimation

​Progress Tracking

​Best Practices

​Limitations

​Performance Considerations

​Related Resources

​Next Steps

Audio Processing

PWA Setup

Build docs developers (and LLMs) love

Web Speech API Integration

Overview

Core Implementation

The speak() Function

Key Components

1. SpeechSynthesisUtterance Properties

2. Voice Selection Algorithm

Speech Control Functions

Starting Speech

Stopping Speech

Playing from History

Event Handling

Browser Compatibility

Checking for Support

Voice Loading

Voice Configuration Data

Advanced Features

Duration Estimation

Progress Tracking

Best Practices

Limitations

Performance Considerations

Related Resources

Next Steps