Language Options - AI Video Presentation Generator

Overview

The AI Video Presentation Generator supports 11+ languages for voice narration through Sarvam AI’s text-to-speech API. Each language has carefully selected voice speakers and configurable parameters for tone, pace, and style.

Supported Languages

The system supports the following languages with dedicated voice speakers:

Primary Languages

English

Language Code: en-IN
Speaker: anushka
Voice Type: Indian English Female
Use Case: International presentations, technical content

Hindi

Language Code: hi-IN
Speaker: manisha
Voice Type: Hindi Female
Use Case: North Indian audience, cultural content

Kannada

Language Code: kn-IN
Speaker: vidya
Voice Type: Kannada Female
Use Case: Karnataka region, local content

Telugu

Language Code: te-IN
Speaker: arya
Voice Type: Telugu Female
Use Case: Andhra Pradesh, Telangana regions

Additional Supported Languages

The system also supports these languages (configured in voice_generator.py:142-155):

Language	Code	Region	Status
Tamil	`ta-IN`	Tamil Nadu	Supported
Bengali	`bn-IN`	West Bengal, Bangladesh	Supported
Gujarati	`gu-IN`	Gujarat	Supported
Malayalam	`ml-IN`	Kerala	Supported
Marathi	`mr-IN`	Maharashtra	Supported
Odia	`or-IN`	Odisha	Supported
Punjabi	`pa-IN`	Punjab	Supported

All language codes follow the ISO 639-1 standard with -IN suffix for Indian regional variants.

Language Configuration

Setting Language Parameter

Specify the language when making a presentation generation request:

request_data = {
    "topic": "Introduction to Machine Learning",
    "num_slides": 5,
    "language": "english",  # Language name in lowercase
    "tone": "formal"
}

Language-Speaker Mapping

The system automatically maps language names to appropriate voice speakers (from config.py:40-45):

SARVAM_SPEAKER_MAP = {
    "english": "anushka",  # Indian English female voice
    "hindi": "manisha",    # Hindi female voice
    "kannada": "vidya",    # Kannada female voice
    "telugu": "arya"       # Telugu female voice
}

If no speaker is configured for a language, the system defaults to “anushka” (English).

Tone Options

Customize the narration style with different tone parameters:

Available Tones

Formal
Casual
Enthusiastic

Use Case: Professional presentations, academic content, business meetingsCharacteristics:

Professional and authoritative
Clear pronunciation
Structured delivery
Appropriate for corporate settings

Example Topics:

Technical documentation
Research presentations
Product launches
Investor pitches

Setting Tone

Include the tone parameter in your request:

request = GenerateRequest(
    topic="Digital Marketing Strategies",
    num_slides=5,
    language="english",
    tone="formal"  # Options: "formal", "casual", "enthusiastic"
)

Voice Customization

TTS Parameters

The voice generator uses these parameters for audio synthesis (from voice_generator.py:26-36):

pitch

number

default:"0"

Voice pitch adjustment in semitones.

Negative values: Lower pitch (deeper voice)
Zero: Natural pitch
Positive values: Higher pitch

Range: -12 to +12

pace

number

default:"1.0"

Speech rate multiplier.

0.5: Half speed (slower)
1.0: Normal speed
2.0: Double speed (faster)

Range: 0.5 to 2.0

loudness

number

default:"1.5"

Audio volume multiplier.

1.0: Standard volume
1.5: Increased volume (default)
2.0: Maximum volume

Range: 0.5 to 2.0

speech_sample_rate

number

default:"22050"

Audio sample rate in Hz. Higher values = better quality but larger files.

16000: Lower quality, smaller files
22050: Balanced (default)
44100: High quality, larger files

Example Custom Configuration

To customize voice parameters, modify the payload in voice_generator.py:26-36:

payload = {
    "inputs": [narration_text],
    "target_language_code": self._get_language_code(language),
    "speaker": speaker,
    "pitch": 0,           # Adjust for voice pitch
    "pace": 1.0,          # Adjust for speech rate
    "loudness": 1.5,      # Adjust for volume
    "speech_sample_rate": 22050,
    "enable_preprocessing": True,
    "model": Config.SARVAM_MODEL
}

Multi-Language Support Details

How Language Processing Works

Content Generation

Gemini AI generates presentation content in the requested language.The ScriptGenerator creates narration scripts appropriate for the language and tone.

Script Formatting

Scripts are formatted with timestamps and segmented by slide.Each slide gets a dedicated narration text aligned with its content.

Language Mapping

The system maps the language name to the appropriate:

Language code (e.g., en-IN, hi-IN)
Voice speaker (e.g., anushka, manisha)
Regional variations if applicable

Audio Generation

Sarvam AI’s TTS engine synthesizes audio for each slide.Audio files are saved to backend/outputs/audio/ as WAV files.

Audio Combination

Individual slide audio files are concatenated into one complete narration.The final audio is synchronized with slide timings.

Per-Slide Audio Generation

The system generates audio per slide for better timing control (from app.py:258-286):

for idx, slide_script in enumerate(script_data['slide_scripts'], 1):
    slide_num = slide_script['slide_number']
    
    # Generate audio for this slide
    audio_path = voice_gen.generate_voice_for_slide(
        slide_script['narration_text'],
        slide_num,
        topic,
        language  # Language parameter passed here
    )
    
    # Calculate actual audio duration
    audio_clip = AudioFileClip(audio_path)
    actual_durations[slide_num] = audio_clip.duration

This approach ensures:

Accurate slide timing
Better audio quality
Easier debugging
Flexible editing

Text Length Limitations

Sarvam AI TTS has a 500 character limit per request. The system automatically handles this:

def generate_voice_for_slide(self, narration_text: str, ...):
    # Limit to 500 characters per request
    payload = {
        "inputs": [narration_text[:500]],
        # ... other parameters
    }

For longer narrations, text is split into chunks (see voice_generator.py:157-183).

Language Selection Best Practices

Choose Based on Audience

Consider your target audience:

International: Use English
Regional: Use local language (Hindi, Kannada, etc.)
Multilingual: Generate multiple versions

Match language to audience demographics for better engagement.

Match Content Type to Tone

Tone selection guidelines:

Technical/Professional: Use formal tone
Educational/Tutorial: Use casual tone
Marketing/Sales: Use enthusiastic tone

Appropriate tone enhances message delivery.

Test Different Languages

Quality assurance:

Generate test videos in target languages
Verify pronunciation and clarity
Check cultural appropriateness
Gather feedback from native speakers

Testing ensures quality across all languages.

Consider Regional Variations

Regional considerations:

Indian English has distinct pronunciation
Regional languages may have dialects
Cultural context affects word choice
Formal vs. casual varies by culture

Be mindful of regional nuances.

Code Examples

Backend: Language Processing

From voice_generator.py:140-155, the language code mapping:

def _get_language_code(self, language: str) -> str:
    """Map language name to Sarvam AI language code"""
    language_map = {
        "english": "en-IN",
        "hindi": "hi-IN",
        "kannada": "kn-IN",
        "telugu": "te-IN",
        "tamil": "ta-IN",
        "bengali": "bn-IN",
        "gujarati": "gu-IN",
        "malayalam": "ml-IN",
        "marathi": "mr-IN",
        "odia": "or-IN",
        "punjabi": "pa-IN"
    }
    return language_map.get(language.lower(), "en-IN")

Frontend: Language Selection UI

Example React component for language selection:

import React, { useState } from 'react';

const languages = [
  { value: 'english', label: 'English' },
  { value: 'hindi', label: 'हिन्दी (Hindi)' },
  { value: 'kannada', label: 'ಕನ್ನಡ (Kannada)' },
  { value: 'telugu', label: 'తెలుగు (Telugu)' },
  { value: 'tamil', label: 'தமிழ் (Tamil)' },
  // ... more languages
];

const LanguageSelector = ({ value, onChange }) => {
  return (
    <select 
      value={value} 
      onChange={(e) => onChange(e.target.value)}
      className="language-selector"
    >
      {languages.map(lang => (
        <option key={lang.value} value={lang.value}>
          {lang.label}
        </option>
      ))}
    </select>
  );
};

Troubleshooting

Language Not Supported Error

Problem: Error message about unsupported languageSolutions:

Check spelling of language name (must be lowercase)
Verify language is in supported list
Use English name, not native name (“hindi” not “हिन्दी”)
Check if Sarvam AI supports the language

Audio Quality Issues

Problem: Poor audio quality or robotic voiceSolutions:

Increase speech_sample_rate to 44100
Adjust pace parameter (try 0.9 for better clarity)
Check loudness setting (may cause distortion if too high)
Verify API key has access to latest models

Wrong Voice Gender/Accent

Problem: Voice doesn’t match expected speakerSolutions:

Check speaker mapping in config.py
Verify language is correctly specified
Consult Sarvam AI docs for available speakers
Update speaker map if needed

Text Cut Off (500 Character Limit)

Problem: Narration text is truncatedSolutions:

System automatically limits to 500 chars per slide
Use generate_complete_audio() for long texts (auto-chunks)
Reduce narration length in script generation
Split slides for better pacing

Performance Considerations

Audio Generation Time

Expected generation times per slide:

Language	Avg. Time	Notes
English	2-3 seconds	Fastest, most optimized
Hindi	3-4 seconds	Good performance
Regional	4-5 seconds	Slightly slower due to processing

Total presentation time = (slides × avg. time) + combination overhead

Optimization Tips

Parallel Processing

Audio generation runs sequentially by default.Consider parallel processing for faster generation with multiple slides.

Caching

Cache generated audio for repeated presentations.Reuse audio files if topic/script haven’t changed.

Batch Requests

Generate multiple slides in batches.Reduces API overhead and improves throughput.

Audio Compression

Use compressed formats (MP3) instead of WAV.Reduces storage and bandwidth requirements.

Next Steps

API Keys Setup

Learn how to obtain your Sarvam AI API key

Environment Setup

Configure your complete development environment

Generation Process

Understand how the full generation pipeline works

API Reference

View complete API documentation for generation endpoint

Get Started

Core Features

User Guides

Configuration

​Overview

​Supported Languages

​Primary Languages

English

Hindi

Kannada

Telugu

​Additional Supported Languages

​Language Configuration

​Setting Language Parameter

​Language-Speaker Mapping

​Tone Options

​Available Tones

​Setting Tone

​Voice Customization

​TTS Parameters

​Example Custom Configuration

​Multi-Language Support Details

​How Language Processing Works

​Per-Slide Audio Generation

​Text Length Limitations

​Language Selection Best Practices

​Code Examples

​Backend: Language Processing

​Frontend: Language Selection UI

​Troubleshooting

​Performance Considerations

​Audio Generation Time

​Optimization Tips

Parallel Processing

Caching

Batch Requests

Audio Compression

​Next Steps

API Keys Setup

Environment Setup

Generation Process

API Reference

Build docs developers (and LLMs) love

Overview

Supported Languages

Primary Languages

Additional Supported Languages

Language Configuration

Setting Language Parameter

Language-Speaker Mapping

Tone Options

Available Tones

Setting Tone

Voice Customization

TTS Parameters

Example Custom Configuration

Multi-Language Support Details

How Language Processing Works

Per-Slide Audio Generation

Text Length Limitations

Language Selection Best Practices

Code Examples

Backend: Language Processing

Frontend: Language Selection UI

Troubleshooting

Performance Considerations

Audio Generation Time

Optimization Tips

Next Steps