Customization Options

VozCraft offers extensive customization controls that let you fine-tune every aspect of your generated audio. This guide provides comprehensive documentation of all customization parameters, their technical details, and best practices for achieving optimal results.

Customization Overview

VozCraft’s audio is shaped by four main customization axes:

Voice Type

2 Options: Normal (0.75 pitch) and High-pitched (1.30 pitch)Controls the base pitch and gender characteristics of the voice

Speed

5 Options: Very Slow to Very Fast (0.50x - 1.60x)Controls how quickly the text is spoken

Mood

8 Options: Neutral, Happy, Serious, and moreControls emotional tone through pitch, rate, and volume

Language

22+ Options: Multiple languages and regional accentsControls pronunciation, accent, and language

Combined Effect: These parameters work together multiplicatively, allowing for thousands of unique voice combinations.

Voice Type (Género de Voz)

Voice Type controls the base pitch and attempts to select appropriate system voices:

Normal Voice (Voz Normal) 🔉

{
  label: 'Voz Normal',
  pitch: 0.75,
  rateAdd: -0.05,
  emoji: '🔉',
  desc: 'Normal Voice'
}

Characteristics:

Pitch Multiplier: 0.75 (25% lower than baseline)
Rate Adjustment: -0.05 (slightly slower)
Gender Preference: Male voices
Tone: Deeper, more authoritative
Voice Search Keywords: “male”, “man”, “guy”, “masculin”, and specific male voice names

Technical Details:

Base frequency: 120 Hz * 0.75 = 90 Hz
Typical range: 80-100 Hz (low male voice)
Combined with mood pitch: finalPitch = 0.75 * moodPitch

Best For:

Professional content
Business presentations
Educational material
Audiobooks
Serious topics
Long-form content

High-pitched Voice (Voz Aguda) 🔊

{
  label: 'Voz Aguda',
  pitch: 1.30,
  rateAdd: 0.05,
  emoji: '🔊',
  desc: 'High-pitched Voice'
}

Characteristics:

Pitch Multiplier: 1.30 (30% higher than baseline)
Rate Adjustment: +0.05 (slightly faster)
Gender Preference: Female voices
Tone: Lighter, more energetic
Voice Search Keywords: “female”, “woman”, “girl”, “femenin”, and specific female voice names

Technical Details:

Base frequency: 120 Hz * 1.30 = 156 Hz
Typical range: 150-180 Hz (female voice)
Combined with mood pitch: finalPitch = 1.30 * moodPitch

Best For:

Children’s content
Marketing and advertising
Upbeat announcements
Character voices
Entertainment content
Energetic presentations

Voice Type Selection Algorithm

VozCraft intelligently selects system voices:

const wantFemale = genero === 'Voz Aguda';
const availableVoices = window.speechSynthesis.getVoices();
const languageVoices = availableVoices.filter(v => 
  v.lang === selectedLang || v.lang.startsWith(selectedLang.split('-')[0])
);

// Search for gender-appropriate voice
const femaleKeywords = [
  'female', 'woman', 'girl', 'femenin',
  'paulina', 'mónica', 'lucia', 'valentina', 'rosa',
  'samantha', 'karen', 'alice', 'milena'
];

const maleKeywords = [
  'male', 'man', 'guy', 'masculin',
  'jorge', 'carlos', 'diego', 'miguel', 'alex',
  'daniel', 'thomas', 'james', 'mark'
];

const keywords = wantFemale ? femaleKeywords : maleKeywords;
const matchedVoice = languageVoices.find(v =>
  keywords.some(keyword => v.name.toLowerCase().includes(keyword))
);

// Use matched voice or fallback to first available
selectedVoice = matchedVoice || languageVoices[0];

System Dependent: Voice availability depends on your operating system. VozCraft will use the best available voice, but gender matching is not guaranteed on all systems.

Speed (Velocidad) ⚡

Speed controls how fast the text is spoken, measured as a rate multiplier:

Speed Options

Very Slow (Muy Lento)
Slow (Lento)
Normal
Fast (Rápido)
Very Fast (Muy Rápido)

{ label: 'Muy Lento', rate: 0.50 }

Speed: 0.50x (Half speed)Characteristics:

Extremely slow, deliberate pace
Maximum clarity and articulation
Easy to follow for non-native speakers
Ideal for learning and note-taking

Duration Impact: 2x longer than normal

100 characters: ~14 seconds (vs 7 at normal)
1000 characters: ~140 seconds (vs 70 at normal)

Use Cases:

Language learning (pronunciation practice)
Dictation and transcription
Accessibility (processing difficulties)
Complex technical content
Meditation and relaxation

{ label: 'Lento', rate: 0.75 }

Speed: 0.75x (Three-quarters speed)Characteristics:

Moderately slow, comfortable pace
Clear pronunciation
Easy comprehension
Natural rhythm maintained

Duration Impact: 1.33x longer than normal

100 characters: ~9.3 seconds (vs 7 at normal)
1000 characters: ~93 seconds (vs 70 at normal)

Use Cases:

Educational content
Instructional material
Elderly audience
Non-native speakers
Important information

{ label: 'Normal', rate: 1.00 }

Speed: 1.00x (Default speed)Characteristics:

Natural conversational pace
Balanced speed and clarity
Most comfortable for extended listening
Standard for most content

Duration Impact: Baseline

100 characters: ~7 seconds
1000 characters: ~70 seconds
Formula: duration ≈ characters / 14 seconds

Use Cases:

General content
Audiobooks
News and articles
Podcasts
Professional content
Default choice for most scenarios

{ label: 'Rápido', rate: 1.25 }

Speed: 1.25x (125% speed)Characteristics:

Brisk, efficient pace
Slight clarity trade-off
Energetic delivery
Time-efficient

Duration Impact: 0.8x normal duration (20% shorter)

100 characters: ~5.6 seconds (vs 7 at normal)
1000 characters: ~56 seconds (vs 70 at normal)

Use Cases:

Quick reviews
Experienced listeners
Time-sensitive content
Updates and summaries
Energetic presentations

{ label: 'Muy Rápido', rate: 1.60 }

Speed: 1.60x (160% speed)Characteristics:

Very rapid, compressed delivery
Clarity significantly reduced
Requires focused attention
Maximum time efficiency

Duration Impact: 0.625x normal duration (37.5% shorter)

100 characters: ~4.4 seconds (vs 7 at normal)
1000 characters: ~44 seconds (vs 70 at normal)

Use Cases:

Rapid information consumption
Review of familiar material
Time-critical scenarios
Experienced TTS users
Skimming content

Warning: May be difficult to understand for some listeners or languages.

Speed Calculation

Speed combines with voice type and mood:

// Final rate calculation
const baseRate = VELOCIDADES.find(v => v.label === velocidad).rate; // 0.50 to 1.60
const voiceRateAdd = GENEROS.find(g => g.label === genero).rateAdd;  // -0.05 or +0.05
const moodRateMulti = ANIMOS.find(a => a.label === animo).rateMulti; // 0.78 to 1.30

const effectiveRate = (baseRate + voiceRateAdd) * moodRateMulti;

// Applied to speech synthesis
utterance.rate = Math.max(0.1, Math.min(10, effectiveRate));

Example Calculations:

Example: Normal + Neutral

Base rate: 1.00 (Normal)
Voice add: -0.05 (Normal voice)
Mood multi: 1.00 (Neutral)

Effective rate = (1.00 + (-0.05)) * 1.00 = 0.95

Result: Slightly slower than baseline

Example: Very Fast + High-pitched + Energetic

Base rate: 1.60 (Very Fast)
Voice add: +0.05 (High-pitched)
Mood multi: 1.30 (Energetic)

Effective rate = (1.60 + 0.05) * 1.30 = 2.145

Result: Extremely fast (2.14x normal speed!)

Example: Very Slow + Normal + Melancholic

Base rate: 0.50 (Very Slow)
Voice add: -0.05 (Normal voice)
Mood multi: 0.78 (Melancholic)

Effective rate = (0.50 + (-0.05)) * 0.78 = 0.351

Result: Extremely slow (0.35x speed)

Mood (Estado de Ánimo) 💫

Mood presets modify pitch, rate, and volume to create emotional character:

Mood Options

{
  label: 'Neutral',
  pitch: 1.00,
  rateMulti: 1.00,
  volume: 1.00,
  desc: 'Balanced expression',
  emoji: '😐'
}

Parameters:

Pitch: 1.00 (baseline)
Rate: 1.00x (no change)
Volume: 100%

Characteristics:

Completely neutral emotional tone
Balanced, professional sound
No pitch or rate modifications
Standard reference point

Effective Pitch Examples:

Normal voice: 0.75 * 1.00 = 0.75
High-pitched: 1.30 * 1.00 = 1.30

Use Cases:

Professional presentations
News and journalism
Technical documentation
Business communications
When other moods are too expressive

{
  label: 'Alegre',
  pitch: 1.25,
  rateMulti: 1.15,
  volume: 1.00,
  desc: 'High and lively tone',
  emoji: '😄'
}

Parameters:

Pitch: 1.25 (25% higher)
Rate: 1.15x (15% faster)
Volume: 100%

Characteristics:

Uplifted, cheerful tone
Brighter, more animated delivery
Slightly faster for energy
Positive, optimistic feel

Effective Pitch Examples:

Normal voice: 0.75 * 1.25 = 0.9375 (still relatively low)
High-pitched: 1.30 * 1.25 = 1.625 (very high!)

Use Cases:

Marketing and advertising
Celebrations and announcements
Children’s content
Motivational material
Positive news
Welcome messages

{
  label: 'Serio',
  pitch: 0.80,
  rateMulti: 0.88,
  volume: 0.95,
  desc: 'Deep, steady and firm',
  emoji: '😠'
}

Parameters:

Pitch: 0.80 (20% lower)
Rate: 0.88x (12% slower)
Volume: 95%

Characteristics:

Lower, more authoritative pitch
Slower, deliberate pace
Slightly reduced volume
Grave, important tone

Effective Pitch Examples:

Normal voice: 0.75 * 0.80 = 0.60 (very deep)
High-pitched: 1.30 * 0.80 = 1.04 (moderate)

Use Cases:

Formal announcements
Serious topics (health, safety)
Legal content
Official statements
Solemn occasions
Authority and credibility

{
  label: 'Entusiasta',
  pitch: 1.35,
  rateMulti: 1.25,
  volume: 1.00,
  desc: 'Very energetic and expressive',
  emoji: '🤩'
}

Parameters:

Pitch: 1.35 (35% higher)
Rate: 1.25x (25% faster)
Volume: 100%

Characteristics:

Highest pitch modifier
Fast, dynamic delivery
Maximum energy and excitement
Very expressive

Effective Pitch Examples:

Normal voice: 0.75 * 1.35 = 1.0125 (moderate-high)
High-pitched: 1.30 * 1.35 = 1.755 (extremely high!)

Use Cases:

Sports commentary
Motivational speeches
Exciting announcements
Product launches
High-energy content
Celebrations

{
  label: 'Melancólico',
  pitch: 0.70,
  rateMulti: 0.78,
  volume: 0.88,
  desc: 'Soft, slow and nostalgic',
  emoji: '😔'
}

Parameters:

Pitch: 0.70 (30% lower) - Lowest!
Rate: 0.78x (22% slower) - Slowest!
Volume: 88% - Quietest!

Characteristics:

Lowest pitch of all moods
Slowest pace
Quietest volume
Soft, contemplative tone
Nostalgic, reflective feel

Effective Pitch Examples:

Normal voice: 0.75 * 0.70 = 0.525 (very deep)
High-pitched: 1.30 * 0.70 = 0.91 (moderate-low)

Use Cases:

Poetry and literature
Memorial content
Reflective pieces
Sad or somber topics
Nostalgic narration
Bedtime stories (calming)

{
  label: 'Enérgico',
  pitch: 1.15,
  rateMulti: 1.30,
  volume: 1.00,
  desc: 'Fast, dynamic and powerful',
  emoji: '⚡'
}

Parameters:

Pitch: 1.15 (15% higher)
Rate: 1.30x (30% faster) - Fastest!
Volume: 100%

Characteristics:

Fastest mood (highest rate multiplier)
Moderate pitch elevation
Full volume
Dynamic, powerful delivery

Effective Pitch Examples:

Normal voice: 0.75 * 1.15 = 0.8625 (moderate)
High-pitched: 1.30 * 1.15 = 1.495 (very high)

Use Cases:

Workout instructions
Action content
Urgent messages
Fast-paced narration
High-intensity content
Quick announcements

{
  label: 'Relajado',
  pitch: 0.88,
  rateMulti: 0.82,
  volume: 0.90,
  desc: 'Calm and slow-paced',
  emoji: '😌'
}

Parameters:

Pitch: 0.88 (12% lower)
Rate: 0.82x (18% slower)
Volume: 90%

Characteristics:

Slightly lowered pitch
Slow, calming pace
Reduced volume
Peaceful, soothing tone

Effective Pitch Examples:

Normal voice: 0.75 * 0.88 = 0.66 (low)
High-pitched: 1.30 * 0.88 = 1.144 (moderate-high)

Use Cases:

Meditation guides
Sleep stories
ASMR content
Relaxation exercises
Calm instructions
Bedtime content

{
  label: 'Tenso',
  pitch: 1.10,
  rateMulti: 1.18,
  volume: 0.95,
  desc: 'Urgent and tense',
  emoji: '😤'
}

Parameters:

Pitch: 1.10 (10% higher)
Rate: 1.18x (18% faster)
Volume: 95%

Characteristics:

Moderately elevated pitch
Faster pace
Slightly reduced volume
Urgent, stressed tone

Effective Pitch Examples:

Normal voice: 0.75 * 1.10 = 0.825 (moderate)
High-pitched: 1.30 * 1.10 = 1.43 (high)

Use Cases:

Thriller narration
Dramatic content
Suspenseful moments
Alert messages
Tense situations
Urgent announcements

Mood Visualization

VozCraft displays a visual mood indicator showing the relative values:

[🤩] Enthusiastic · Very energetic and expressive

Pitch      █████████░░░  85%
Rate       ████████░░░░  73%
Volume     ████████████ 100%

Calculation:

const pitchPercent = Math.round((pitch - 0.70) / 0.65 * 100);
const ratePercent = Math.round((rateMulti - 0.78) / 0.52 * 100);
const volumePercent = Math.round((volume - 0.88) / 0.12 * 100);

Normalization:

Pitch: Scales from 0.70 (Melancholic) to 1.35 (Enthusiastic)
Rate: Scales from 0.78 (Melancholic) to 1.30 (Energetic)
Volume: Scales from 0.88 (Melancholic) to 1.00 (multiple moods)

Combining Customizations

Parameter Interaction

All customization parameters work together:

// Final synthesized audio parameters
const finalPitch = voiceTypePitch * moodPitch;
const finalRate = (baseSpeed + voiceTypeRateAdd) * moodRateMulti;
const finalVolume = moodVolume;

// Applied with safety clamping
utterance.pitch = Math.max(0.1, Math.min(2, finalPitch));
utterance.rate = Math.max(0.1, Math.min(10, finalRate));
utterance.volume = Math.max(0, Math.min(1, finalVolume));

Extreme Combinations

Highest Pitch Possible

Settings:

Voice Type: High-pitched (1.30)
Mood: Enthusiastic (1.35)

Result:

Pitch = 1.30 * 1.35 = 1.755

Very high, extremely energetic voice (capped at 2.0 by browser)

Lowest Pitch Possible

Settings:

Voice Type: Normal (0.75)
Mood: Melancholic (0.70)

Result:

Pitch = 0.75 * 0.70 = 0.525

Very deep, somber voice

Fastest Rate Possible

Settings:

Speed: Very Fast (1.60)
Voice Type: High-pitched (+0.05)
Mood: Energetic (1.30x)

Result:

Rate = (1.60 + 0.05) * 1.30 = 2.145

Extremely rapid speech

Slowest Rate Possible

Settings:

Speed: Very Slow (0.50)
Voice Type: Normal (-0.05)
Mood: Melancholic (0.78x)

Result:

Rate = (0.50 + (-0.05)) * 0.78 = 0.351

Extremely slow, contemplative pace

Recommended Combinations

Best for business, presentations, formal contentRecommended:

Voice: Normal
Speed: Normal
Mood: Neutral or Serious
Language: Match audience

Result: Authoritative, clear, professional toneParameters:

Pitch: 0.75 (Neutral) or 0.60 (Serious)
Rate: 0.95 (Neutral) or 0.836 (Serious)
Volume: 100% (Neutral) or 95% (Serious)

Advanced Customization Tips

Fine-tuning Your Audio

Start with Defaults

Begin with:

Voice: Normal
Speed: Normal
Mood: Neutral

Generate audio and listen critically.

Adjust One Parameter at a Time

Make incremental changes:

Try High-pitched voice if Normal is too deep
Adjust speed if pacing feels off
Finally, select mood for emotional tone

This helps you understand each parameter’s impact.

Test with Representative Text

Use actual content, not “test test”:

Include punctuation (affects pauses)
Use varied sentence lengths
Include numbers if relevant
Test with typical content length

Consider Your Audience

Customize for listeners:

Native speakers: Normal speed acceptable
Language learners: Slow or Very Slow
Elderly: Slower speeds, moderate pitch
Children: Higher pitch, moderate speed
Professional: Normal voice, Neutral or Serious

Save Successful Combinations

When you find a good combination:

Generate the audio
Give it a descriptive name
Use history to reference settings later
Export transcript to document settings

Common Mistakes to Avoid

Avoid These Combinations:

Too Extreme: Very Fast + Energetic + Enthusiastic = Incomprehensible
Conflicting Moods: Using “Serious” for happy content
Wrong Speed for Audience: Fast speed for language learners
Ignoring Content Length: Very Slow for 5000 character text = 15+ minutes
Not Testing: Always listen before exporting/using

Language-Specific Considerations

Tonal Languages (Chinese, Vietnamese)

Special Considerations:

Pitch changes affect meaning in tonal languages
Stick closer to Neutral mood (1.00 pitch)
Avoid Enthusiastic (1.35) and Melancholic (0.70) extremes
Test carefully with native speakers

Recommended Moods: Neutral, Tense, Relaxed (moderate pitch changes)

Syllable-timed Languages (Spanish, Italian)

Special Considerations:

These languages flow naturally at various speeds
Speed changes generally well-tolerated
Mood variations work well
Natural rhythm preserved at most settings

Recommended: Any combination works well

Stress-timed Languages (English, German)

Special Considerations:

Extreme speeds can disrupt stress patterns
Very Fast (1.60) may reduce clarity significantly
Mood variations generally effective
Consider slower speeds for non-native learners

Recommended: Normal to Fast speeds for best results

Browser Limitations

Web Speech API Constraints

The Web Speech API has built-in limits:

// VozCraft applies safety clamping
const safePitch = Math.max(0.1, Math.min(2, calculatedPitch));
const safeRate = Math.max(0.1, Math.min(10, calculatedRate));
const safeVolume = Math.max(0, Math.min(1, calculatedVolume));

API Limits:

Pitch: 0.1 to 2.0
Rate: 0.1 to 10.0
Volume: 0.0 to 1.0

VozCraft’s combinations stay well within these limits, but extreme values are clamped for safety.

Platform Differences

Chrome/Edge
Safari
Firefox

Characteristics:

Best overall support
Smooth parameter transitions
Good pitch/rate accuracy
Wide range support

Recommended: Primary browser for VozCraft

Next Steps

Voice Settings Guide

Step-by-step guide for optimal voice configuration

Using VozCraft

Complete workflow guide with examples

Audio Export

Learn how to export with custom settings

Get Started

Features

Guides

Customization Options

Customization Options

Customization Overview

Voice Type

Speed

Mood

Language

Voice Type (Género de Voz)

Normal Voice (Voz Normal) 🔉

High-pitched Voice (Voz Aguda) 🔊

Voice Type Selection Algorithm

Speed (Velocidad) ⚡

Speed Options

Speed Calculation

Mood (Estado de Ánimo) 💫

Mood Options

Mood Visualization

Combining Customizations

Parameter Interaction

Extreme Combinations

Highest Pitch Possible

Lowest Pitch Possible

Fastest Rate Possible

Slowest Rate Possible

Recommended Combinations

Advanced Customization Tips

Fine-tuning Your Audio

Common Mistakes to Avoid

Language-Specific Considerations

Browser Limitations

Web Speech API Constraints

Platform Differences

Next Steps

Voice Settings Guide

Using VozCraft

Audio Export

Build docs developers (and LLMs) love

Get Started

Features

Guides

​Customization Options

​Customization Overview

Voice Type

Speed

Mood

Language

​Voice Type (Género de Voz)

​Normal Voice (Voz Normal) 🔉

​High-pitched Voice (Voz Aguda) 🔊

​Voice Type Selection Algorithm

​Speed (Velocidad) ⚡

​Speed Options

​Speed Calculation

​Mood (Estado de Ánimo) 💫

​Mood Options

​Mood Visualization

​Combining Customizations

​Parameter Interaction

​Extreme Combinations

Highest Pitch Possible

Lowest Pitch Possible

Fastest Rate Possible

Slowest Rate Possible

​Recommended Combinations

​Advanced Customization Tips

​Fine-tuning Your Audio

​Common Mistakes to Avoid

​Language-Specific Considerations

​Browser Limitations

​Web Speech API Constraints

​Platform Differences

​Next Steps

Voice Settings Guide

Using VozCraft

Audio Export

Build docs developers (and LLMs) love

Customization Options

Customization Overview

Voice Type (Género de Voz)

Normal Voice (Voz Normal) 🔉

High-pitched Voice (Voz Aguda) 🔊

Voice Type Selection Algorithm

Speed (Velocidad) ⚡

Speed Options

Speed Calculation

Mood (Estado de Ánimo) 💫

Mood Options

Mood Visualization

Combining Customizations

Parameter Interaction

Extreme Combinations

Recommended Combinations

Advanced Customization Tips

Fine-tuning Your Audio

Common Mistakes to Avoid

Language-Specific Considerations

Browser Limitations

Web Speech API Constraints

Platform Differences

Next Steps