Skip to main content

Voice Settings Guide

This comprehensive guide provides detailed instructions for configuring VozCraft’s voice settings to achieve optimal results for any use case. Whether you’re creating content for business, education, or entertainment, this guide will help you select the perfect voice configuration.

Understanding Voice Settings

VozCraft’s audio quality is determined by four interconnected settings:

Language/Accent

Primary FactorDetermines:
  • Base pronunciation rules
  • Available system voices
  • Regional accent characteristics
  • Language-specific prosody

Voice Type

Pitch & GenderControls:
  • Base pitch (0.75 or 1.30)
  • Voice gender preference
  • Rate adjustment (±0.05)
  • Overall voice character

Speed

Temporal ControlAffects:
  • Speaking rate (0.50x to 1.60x)
  • Content duration
  • Clarity vs. efficiency
  • Listener comprehension

Mood

Emotional ToneModifies:
  • Pitch variation (0.70 to 1.35)
  • Rate multiplier (0.78x to 1.30x)
  • Volume (88% to 100%)
  • Emotional character

Step-by-Step Configuration

Step 1: Choose Your Language

Language selection is the most important decision and should be made first.
1

Identify Your Audience

Questions to ask:
  • What language do they speak?
  • Are they native speakers?
  • Which regional accent will they prefer?
  • Are there multiple target regions?
Examples:
  • US company: English (US)
  • Latin American content: Español (México)
  • European French audience: Français (France)
  • International English: English (US) for broadest understanding
2

Test Regional Variants

If your language has multiple regional options, test them:Example: Spanish
  1. Generate with Español (México)
  2. Generate with Español (España)
  3. Listen for:
    • Pronunciation differences
    • Accent preferences
    • Listener feedback
Common Differences:
  • Spain: “th” sound for C/Z (“gracias” = “grathias”)
  • Mexico: No “th” sound (“gracias” = “grasias”)
  • Argentina: “sh” sound for LL/Y (“calle” = “cashe”)
3

Check Voice Availability

Some languages have better voice support on certain platforms:Best Support (most platforms):
  • English (US, UK)
  • Spanish (Mexico, Spain)
  • French
  • German
  • Portuguese (Brazil)
Good Support (most modern systems):
  • Italian, Japanese, Chinese
  • English (AU, IN)
  • Spanish (other variants)
Variable Support (system-dependent):
  • Arabic, Hindi, Turkish, Russian
Action: If voice quality is poor, try a related language or different browser.
4

Make Your Selection

Click the 🌍 Voice / Accent / Region dropdown and select your chosen language.The dropdown shows:
  • Flag emoji
  • Language name (in interface language)
  • Organized by language family
Pro Tip: For international audiences with varying English proficiency, English (US) with Slow speed provides the best comprehension.

Step 2: Select Voice Type

Voice Type controls pitch and attempts to match appropriate system voices.
Technical Specs:
  • Pitch: 0.75 (25% lower)
  • Rate adjustment: -0.05
  • Prefers: Male voices
  • Base frequency: ~90 Hz
Best For:

Professional Content

  • Business presentations
  • Corporate communications
  • Financial reports
  • Legal content

Educational Material

  • Lectures and courses
  • Technical documentation
  • Academic content
  • Training materials

Long-Form Content

  • Audiobooks
  • Articles and blogs
  • Documentation
  • Extended narration

Authoritative Tone

  • News and journalism
  • Official announcements
  • Policy documents
  • Serious topics
When to Use:
  • Default choice for most content
  • When authority and professionalism are priorities
  • For extended listening sessions (less fatiguing)
  • When targeting professional audiences
Avoid When:
  • Creating children’s content (may sound too serious)
  • Marketing to young demographics (may lack energy)
  • When light, friendly tone is needed
Decision Matrix:
Content TypeRecommended Voice TypeWhy
Business PresentationNormalProfessional, authoritative
Product AdHigh-pitchedEnergetic, engaging
AudiobookNormalComfortable for long listening
Children’s StoryHigh-pitchedAge-appropriate, friendly
News ArticleNormalCredible, serious
Training VideoNormalClear, professional
Motivational SpeechHigh-pitchedEnergetic, inspiring
Technical DocsNormalAuthoritative, clear
Gender Matching: VozCraft attempts to select system voices matching the voice type, but availability depends on your operating system. Not all systems provide both male and female voices for every language.

Step 3: Set the Speed

Speed should be chosen based on audience and content complexity.
When to Use:

Language Learning

  • Pronunciation practice
  • Beginner lessons
  • Accent training
  • Dictation exercises

Accessibility

  • Cognitive processing needs
  • Elderly audiences
  • Complex technical content
  • Medical instructions

Transcription

  • Manual transcription work
  • Note-taking
  • Detailed analysis
  • Legal proceedings

Meditation

  • Guided meditation
  • Relaxation exercises
  • Sleep content
  • Breathing exercises
Duration Impact: 2x longer than normal
  • 1000 characters: ~140 seconds vs. 70 at normal
Caution: May sound unnatural for native speakers familiar with the content.
Speed Selection Flowchart:
Is content for language learners? 
  → Yes: Very Slow (0.50x)
  → No: Continue

Is audience non-native speakers?
  → Yes: Slow (0.75x)
  → No: Continue

Is content complex or technical?
  → Yes: Slow (0.75x)
  → No: Continue

Is time efficiency critical?
  → Yes: Fast (1.25x) or Very Fast (1.60x)
  → No: Normal (1.00x)

Step 4: Select the Mood

Mood shapes the emotional character of your audio through pitch, rate, and volume adjustments.
Parameters: Pitch 1.00 | Rate 1.00x | Volume 100%Use Cases:
  • Professional presentations
  • News and journalism
  • Technical documentation
  • Business communications
  • Academic content
  • Reference material
Why Choose Neutral:
  • Most natural-sounding
  • No emotional bias
  • Professional tone
  • Widely acceptable
  • Default recommendation
Combined Examples:
  • Normal + Normal + Neutral = Professional standard
  • High-pitched + Normal + Neutral = Friendly but professional
When to Avoid: Rarely — Neutral works for almost everything
Mood Selection Matrix:
Content EmotionPrimary MoodAlternativeAvoid
ProfessionalNeutralSeriousEnthusiastic, Happy
Upbeat/PositiveHappyEnthusiasticMelancholic, Serious
Formal/ImportantSeriousNeutralHappy, Enthusiastic
Exciting/DynamicEnthusiasticEnergeticMelancholic, Relaxed
Sad/SomberMelancholicSeriousHappy, Enthusiastic
Fast-paced/ActionEnergeticEnthusiasticMelancholic, Relaxed
Calm/SoothingRelaxedNeutralEnergetic, Enthusiastic
Urgent/DramaticTenseEnergeticRelaxed, Melancholic

Configuration Examples by Use Case

Business Presentation

Goal: Professional, authoritative, clear Configuration:
  • Language: English (US) or your business language
  • Voice Type: Normal 🔉
  • Speed: Normal
  • Mood: Neutral 😐
Why This Works:
  • Normal voice provides authority (pitch 0.75)
  • Normal speed ensures comprehension
  • Neutral mood maintains professionalism
  • Universally acceptable for business contexts
Final Parameters:
  • Pitch: 0.75 * 1.00 = 0.75 (authoritative)
  • Rate: (1.00 + -0.05) * 1.00 = 0.95 (clear)
  • Volume: 100%

Children’s Story

Goal: Engaging, friendly, age-appropriate Configuration:
  • Language: Child’s native language
  • Voice Type: High-pitched 🔊
  • Speed: Slow or Normal
  • Mood: Happy 😄
Why This Works:
  • High-pitched voice sounds youthful (pitch 1.30)
  • Slow speed helps comprehension
  • Happy mood adds cheerfulness
  • Combination is engaging for kids
Final Parameters:
  • Pitch: 1.30 * 1.25 = 1.625 (bright, friendly)
  • Rate: (0.75 + 0.05) * 1.15 = 0.92 (comfortable)
  • Volume: 100%

Language Learning

Goal: Maximum clarity for non-native speakers Configuration:
  • Language: Target language
  • Voice Type: Normal 🔉
  • Speed: Very Slow or Slow
  • Mood: Neutral 😐
Why This Works:
  • Normal voice provides clear pronunciation
  • Very Slow speed allows sound processing
  • Neutral mood avoids distracting emotions
  • Focus is entirely on language learning
Final Parameters:
  • Pitch: 0.75 * 1.00 = 0.75 (clear)
  • Rate: (0.50 + -0.05) * 1.00 = 0.45 (very clear)
  • Volume: 100%

Marketing Ad

Goal: Energetic, attention-grabbing, positive Configuration:
  • Language: Target market language
  • Voice Type: High-pitched 🔊
  • Speed: Fast
  • Mood: Enthusiastic 🤩 or Happy 😄
Why This Works:
  • High-pitched voice sounds energetic
  • Fast speed conveys excitement
  • Enthusiastic mood maximizes energy
  • Combination captures attention
Final Parameters (Enthusiastic):
  • Pitch: 1.30 * 1.35 = 1.755 (very high energy)
  • Rate: (1.25 + 0.05) * 1.25 = 1.625 (dynamic)
  • Volume: 100%
Caution: Test with focus group — may be too intense for some

Meditation Guide

Goal: Calming, soothing, peaceful Configuration:
  • Language: Listener’s language
  • Voice Type: Normal 🔉
  • Speed: Very Slow or Slow
  • Mood: Relaxed 😌 or Melancholic 😔
Why This Works:
  • Normal voice provides gentle tone
  • Very Slow creates calm pacing
  • Relaxed mood reduces intensity
  • Combination promotes relaxation
Final Parameters (Relaxed):
  • Pitch: 0.75 * 0.88 = 0.66 (gentle, low)
  • Rate: (0.50 + -0.05) * 0.82 = 0.369 (very slow, calming)
  • Volume: 90% (softer)

Technical Documentation

Goal: Clear, professional, authoritative Configuration:
  • Language: Documentation language
  • Voice Type: Normal 🔉
  • Speed: Slow or Normal
  • Mood: Neutral 😐
Why This Works:
  • Normal voice sounds professional
  • Slow speed aids comprehension of complex terms
  • Neutral mood maintains focus on content
  • Optimal for technical material
Final Parameters:
  • Pitch: 0.75 * 1.00 = 0.75 (professional)
  • Rate: (0.75 + -0.05) * 1.00 = 0.70 (clear, measured)
  • Volume: 100%

Testing and Refinement

A/B Testing Your Settings

1

Create Test Sample

Write 100-200 characters representing typical content:
Welcome to our comprehensive guide. In this section, we'll explore
the key features and benefits. Let's begin with the fundamentals.
2

Generate Variation A

Your hypothesis for best settings:
  • Configure VozCraft
  • Generate audio
  • Name: “Test_A”
3

Generate Variation B

Alternative configuration (change ONE parameter):
  • Adjust one setting
  • Generate audio
  • Name: “Test_B”
4

Blind Listening Test

Play both without knowing which is which:
  • Have colleague play them for you
  • Or use history (don’t look at settings)
  • Listen multiple times
5

Evaluate Objectively

Rate each on:
  • Clarity (1-10)
  • Naturalness (1-10)
  • Tone appropriateness (1-10)
  • Professional quality (1-10)
  • Overall preference
6

Select Winner & Document

Choose best configuration:
  • Record exact settings
  • Save as reference
  • Use for all future content

Gathering Feedback

Internal Team Review

Process:
  1. Share audio with 3-5 team members
  2. Provide feedback form:
    • Clarity rating
    • Professional rating
    • Suggested improvements
  3. Compile feedback
  4. Make adjustments
  5. Re-test if needed

Target Audience Testing

Process:
  1. Select 5-10 audience representatives
  2. Share audio without context
  3. Ask:
    • Is it easy to understand?
    • Does pace feel comfortable?
    • Does tone match expectations?
    • Would you listen to more?
  4. Iterate based on feedback

Common Configuration Mistakes

Mistake 1: Too Many Extreme SettingsProblem: Very Fast + High-pitched + Enthusiastic = IncomprehensibleResult: Pitch 1.755, Rate 1.625 — too extreme for most listenersSolution: Use at most ONE extreme setting, keep others moderate
Mistake 2: Mismatched Mood and ContentProblem: Using Happy mood for serious medical informationResult: Inappropriate tone undermines message credibilitySolution: Match mood to content emotional context
Mistake 3: Wrong Speed for AudienceProblem: Fast speed for elderly audience or language learnersResult: Poor comprehension, frustrated listenersSolution: Always consider audience capabilities, err on slower side
Mistake 4: Not Testing Before ProductionProblem: Generating all content before listening to sampleResult: Discover issues after time investmentSolution: Always generate and review test audio first

Quick Reference Guide

Setting Selection Cheat Sheet

CONTENT TYPE           | VOICE  | SPEED | MOOD
-----------------------|--------|-------|-------------
Business               | Normal | Normal| Neutral
Marketing              | High   | Fast  | Happy/Enthus
Education              | Normal | Slow  | Neutral
Children               | High   | Normal| Happy
Audiobook              | Either | Normal| Neutral
News                   | Normal | Normal| Neutral
Technical              | Normal | Slow  | Neutral
Motivational           | High   | Fast  | Enthusiastic
Meditation             | Normal | V.Slow| Relaxed
Language Learning      | Normal | V.Slow| Neutral
Thriller/Drama         | Either | Normal| Tense
Comedy/Entertainment   | High   | Fast  | Happy

Next Steps

Using VozCraft

Complete workflow guide with examples

Customization

Deep dive into all customization parameters

Troubleshooting

Fix common voice quality issues

Build docs developers (and LLMs) love