Skip to main content

Customization Options

VozCraft offers extensive customization controls that let you fine-tune every aspect of your generated audio. This guide provides comprehensive documentation of all customization parameters, their technical details, and best practices for achieving optimal results.

Customization Overview

VozCraft’s audio is shaped by four main customization axes:

Voice Type

2 Options: Normal (0.75 pitch) and High-pitched (1.30 pitch)Controls the base pitch and gender characteristics of the voice

Speed

5 Options: Very Slow to Very Fast (0.50x - 1.60x)Controls how quickly the text is spoken

Mood

8 Options: Neutral, Happy, Serious, and moreControls emotional tone through pitch, rate, and volume

Language

22+ Options: Multiple languages and regional accentsControls pronunciation, accent, and language
Combined Effect: These parameters work together multiplicatively, allowing for thousands of unique voice combinations.

Voice Type (Género de Voz)

Voice Type controls the base pitch and attempts to select appropriate system voices:

Normal Voice (Voz Normal) 🔉

{
  label: 'Voz Normal',
  pitch: 0.75,
  rateAdd: -0.05,
  emoji: '🔉',
  desc: 'Normal Voice'
}
Characteristics:
  • Pitch Multiplier: 0.75 (25% lower than baseline)
  • Rate Adjustment: -0.05 (slightly slower)
  • Gender Preference: Male voices
  • Tone: Deeper, more authoritative
  • Voice Search Keywords: “male”, “man”, “guy”, “masculin”, and specific male voice names
Technical Details:
  • Base frequency: 120 Hz * 0.75 = 90 Hz
  • Typical range: 80-100 Hz (low male voice)
  • Combined with mood pitch: finalPitch = 0.75 * moodPitch
Best For:
  • Professional content
  • Business presentations
  • Educational material
  • Audiobooks
  • Serious topics
  • Long-form content

High-pitched Voice (Voz Aguda) 🔊

{
  label: 'Voz Aguda',
  pitch: 1.30,
  rateAdd: 0.05,
  emoji: '🔊',
  desc: 'High-pitched Voice'
}
Characteristics:
  • Pitch Multiplier: 1.30 (30% higher than baseline)
  • Rate Adjustment: +0.05 (slightly faster)
  • Gender Preference: Female voices
  • Tone: Lighter, more energetic
  • Voice Search Keywords: “female”, “woman”, “girl”, “femenin”, and specific female voice names
Technical Details:
  • Base frequency: 120 Hz * 1.30 = 156 Hz
  • Typical range: 150-180 Hz (female voice)
  • Combined with mood pitch: finalPitch = 1.30 * moodPitch
Best For:
  • Children’s content
  • Marketing and advertising
  • Upbeat announcements
  • Character voices
  • Entertainment content
  • Energetic presentations

Voice Type Selection Algorithm

VozCraft intelligently selects system voices:
const wantFemale = genero === 'Voz Aguda';
const availableVoices = window.speechSynthesis.getVoices();
const languageVoices = availableVoices.filter(v => 
  v.lang === selectedLang || v.lang.startsWith(selectedLang.split('-')[0])
);

// Search for gender-appropriate voice
const femaleKeywords = [
  'female', 'woman', 'girl', 'femenin',
  'paulina', 'mónica', 'lucia', 'valentina', 'rosa',
  'samantha', 'karen', 'alice', 'milena'
];

const maleKeywords = [
  'male', 'man', 'guy', 'masculin',
  'jorge', 'carlos', 'diego', 'miguel', 'alex',
  'daniel', 'thomas', 'james', 'mark'
];

const keywords = wantFemale ? femaleKeywords : maleKeywords;
const matchedVoice = languageVoices.find(v =>
  keywords.some(keyword => v.name.toLowerCase().includes(keyword))
);

// Use matched voice or fallback to first available
selectedVoice = matchedVoice || languageVoices[0];
System Dependent: Voice availability depends on your operating system. VozCraft will use the best available voice, but gender matching is not guaranteed on all systems.

Speed (Velocidad) ⚡

Speed controls how fast the text is spoken, measured as a rate multiplier:

Speed Options

{ label: 'Muy Lento', rate: 0.50 }
Speed: 0.50x (Half speed)Characteristics:
  • Extremely slow, deliberate pace
  • Maximum clarity and articulation
  • Easy to follow for non-native speakers
  • Ideal for learning and note-taking
Duration Impact: 2x longer than normal
  • 100 characters: ~14 seconds (vs 7 at normal)
  • 1000 characters: ~140 seconds (vs 70 at normal)
Use Cases:
  • Language learning (pronunciation practice)
  • Dictation and transcription
  • Accessibility (processing difficulties)
  • Complex technical content
  • Meditation and relaxation

Speed Calculation

Speed combines with voice type and mood:
// Final rate calculation
const baseRate = VELOCIDADES.find(v => v.label === velocidad).rate; // 0.50 to 1.60
const voiceRateAdd = GENEROS.find(g => g.label === genero).rateAdd;  // -0.05 or +0.05
const moodRateMulti = ANIMOS.find(a => a.label === animo).rateMulti; // 0.78 to 1.30

const effectiveRate = (baseRate + voiceRateAdd) * moodRateMulti;

// Applied to speech synthesis
utterance.rate = Math.max(0.1, Math.min(10, effectiveRate));
Example Calculations:
Base rate: 1.00 (Normal)
Voice add: -0.05 (Normal voice)
Mood multi: 1.00 (Neutral)

Effective rate = (1.00 + (-0.05)) * 1.00 = 0.95
Result: Slightly slower than baseline
Base rate: 1.60 (Very Fast)
Voice add: +0.05 (High-pitched)
Mood multi: 1.30 (Energetic)

Effective rate = (1.60 + 0.05) * 1.30 = 2.145
Result: Extremely fast (2.14x normal speed!)
Base rate: 0.50 (Very Slow)
Voice add: -0.05 (Normal voice)
Mood multi: 0.78 (Melancholic)

Effective rate = (0.50 + (-0.05)) * 0.78 = 0.351
Result: Extremely slow (0.35x speed)

Mood (Estado de Ánimo) 💫

Mood presets modify pitch, rate, and volume to create emotional character:

Mood Options

{
  label: 'Neutral',
  pitch: 1.00,
  rateMulti: 1.00,
  volume: 1.00,
  desc: 'Balanced expression',
  emoji: '😐'
}
Parameters:
  • Pitch: 1.00 (baseline)
  • Rate: 1.00x (no change)
  • Volume: 100%
Characteristics:
  • Completely neutral emotional tone
  • Balanced, professional sound
  • No pitch or rate modifications
  • Standard reference point
Effective Pitch Examples:
  • Normal voice: 0.75 * 1.00 = 0.75
  • High-pitched: 1.30 * 1.00 = 1.30
Use Cases:
  • Professional presentations
  • News and journalism
  • Technical documentation
  • Business communications
  • When other moods are too expressive

Mood Visualization

VozCraft displays a visual mood indicator showing the relative values:
[🤩] Enthusiastic · Very energetic and expressive

Pitch      █████████░░░  85%
Rate       ████████░░░░  73%
Volume     ████████████ 100%
Calculation:
const pitchPercent = Math.round((pitch - 0.70) / 0.65 * 100);
const ratePercent = Math.round((rateMulti - 0.78) / 0.52 * 100);
const volumePercent = Math.round((volume - 0.88) / 0.12 * 100);
Normalization:
  • Pitch: Scales from 0.70 (Melancholic) to 1.35 (Enthusiastic)
  • Rate: Scales from 0.78 (Melancholic) to 1.30 (Energetic)
  • Volume: Scales from 0.88 (Melancholic) to 1.00 (multiple moods)

Combining Customizations

Parameter Interaction

All customization parameters work together:
// Final synthesized audio parameters
const finalPitch = voiceTypePitch * moodPitch;
const finalRate = (baseSpeed + voiceTypeRateAdd) * moodRateMulti;
const finalVolume = moodVolume;

// Applied with safety clamping
utterance.pitch = Math.max(0.1, Math.min(2, finalPitch));
utterance.rate = Math.max(0.1, Math.min(10, finalRate));
utterance.volume = Math.max(0, Math.min(1, finalVolume));

Extreme Combinations

Highest Pitch Possible

Settings:
  • Voice Type: High-pitched (1.30)
  • Mood: Enthusiastic (1.35)
Result:
Pitch = 1.30 * 1.35 = 1.755
Very high, extremely energetic voice (capped at 2.0 by browser)

Lowest Pitch Possible

Settings:
  • Voice Type: Normal (0.75)
  • Mood: Melancholic (0.70)
Result:
Pitch = 0.75 * 0.70 = 0.525
Very deep, somber voice

Fastest Rate Possible

Settings:
  • Speed: Very Fast (1.60)
  • Voice Type: High-pitched (+0.05)
  • Mood: Energetic (1.30x)
Result:
Rate = (1.60 + 0.05) * 1.30 = 2.145
Extremely rapid speech

Slowest Rate Possible

Settings:
  • Speed: Very Slow (0.50)
  • Voice Type: Normal (-0.05)
  • Mood: Melancholic (0.78x)
Result:
Rate = (0.50 + (-0.05)) * 0.78 = 0.351
Extremely slow, contemplative pace
Best for business, presentations, formal contentRecommended:
  • Voice: Normal
  • Speed: Normal
  • Mood: Neutral or Serious
  • Language: Match audience
Result: Authoritative, clear, professional toneParameters:
  • Pitch: 0.75 (Neutral) or 0.60 (Serious)
  • Rate: 0.95 (Neutral) or 0.836 (Serious)
  • Volume: 100% (Neutral) or 95% (Serious)

Advanced Customization Tips

Fine-tuning Your Audio

1

Start with Defaults

Begin with:
  • Voice: Normal
  • Speed: Normal
  • Mood: Neutral
Generate audio and listen critically.
2

Adjust One Parameter at a Time

Make incremental changes:
  1. Try High-pitched voice if Normal is too deep
  2. Adjust speed if pacing feels off
  3. Finally, select mood for emotional tone
This helps you understand each parameter’s impact.
3

Test with Representative Text

Use actual content, not “test test”:
  • Include punctuation (affects pauses)
  • Use varied sentence lengths
  • Include numbers if relevant
  • Test with typical content length
4

Consider Your Audience

Customize for listeners:
  • Native speakers: Normal speed acceptable
  • Language learners: Slow or Very Slow
  • Elderly: Slower speeds, moderate pitch
  • Children: Higher pitch, moderate speed
  • Professional: Normal voice, Neutral or Serious
5

Save Successful Combinations

When you find a good combination:
  • Generate the audio
  • Give it a descriptive name
  • Use history to reference settings later
  • Export transcript to document settings

Common Mistakes to Avoid

Avoid These Combinations:
  1. Too Extreme: Very Fast + Energetic + Enthusiastic = Incomprehensible
  2. Conflicting Moods: Using “Serious” for happy content
  3. Wrong Speed for Audience: Fast speed for language learners
  4. Ignoring Content Length: Very Slow for 5000 character text = 15+ minutes
  5. Not Testing: Always listen before exporting/using

Language-Specific Considerations

Special Considerations:
  • Pitch changes affect meaning in tonal languages
  • Stick closer to Neutral mood (1.00 pitch)
  • Avoid Enthusiastic (1.35) and Melancholic (0.70) extremes
  • Test carefully with native speakers
Recommended Moods: Neutral, Tense, Relaxed (moderate pitch changes)
Special Considerations:
  • These languages flow naturally at various speeds
  • Speed changes generally well-tolerated
  • Mood variations work well
  • Natural rhythm preserved at most settings
Recommended: Any combination works well
Special Considerations:
  • Extreme speeds can disrupt stress patterns
  • Very Fast (1.60) may reduce clarity significantly
  • Mood variations generally effective
  • Consider slower speeds for non-native learners
Recommended: Normal to Fast speeds for best results

Browser Limitations

Web Speech API Constraints

The Web Speech API has built-in limits:
// VozCraft applies safety clamping
const safePitch = Math.max(0.1, Math.min(2, calculatedPitch));
const safeRate = Math.max(0.1, Math.min(10, calculatedRate));
const safeVolume = Math.max(0, Math.min(1, calculatedVolume));
API Limits:
  • Pitch: 0.1 to 2.0
  • Rate: 0.1 to 10.0
  • Volume: 0.0 to 1.0
VozCraft’s combinations stay well within these limits, but extreme values are clamped for safety.

Platform Differences

Characteristics:
  • Best overall support
  • Smooth parameter transitions
  • Good pitch/rate accuracy
  • Wide range support
Recommended: Primary browser for VozCraft

Next Steps

Voice Settings Guide

Step-by-step guide for optimal voice configuration

Using VozCraft

Complete workflow guide with examples

Audio Export

Learn how to export with custom settings

Build docs developers (and LLMs) love