Skip to main content

Overview

The ScriptGenerator class creates natural-sounding voice narration scripts for each slide. It takes the content structure from ContentGenerator and produces timed narration text that matches the specified language and tone.

Class Definition

from generators.script_generator import ScriptGenerator

script_gen = ScriptGenerator()

Constructor

def __init__(self)
Initializes the script generator with Google Gemini AI configuration. Configuration:
  • Model: Uses Config.GEMINI_MODEL
  • Response Format: JSON (application/json MIME type)
  • API Key: Configured via Config.GEMINI_API_KEY

Methods

generate_scripts

Generates voice narration scripts with timestamps for all slides.
def generate_scripts(content_data: Dict, language: str = "english", 
                     tone: str = "formal") -> Dict
content_data
Dict
required
The presentation content structure from ContentGenerator
language
string
default:"english"
Language for narration. Supported: english, hindi, kannada, telugu, tamil, bengali, gujarati, malayalam, marathi, odia, punjabi
tone
string
default:"formal"
Narration tone style:
  • formal: Academic, precise, technical language
  • casual: Friendly, conversational, easy to understand
  • storytelling: Narrative style with stories and examples
return
Dict
Script data with narration text and timestamps for each slide
Returns structure:
{
  "topic": "Newton's Laws of Motion",
  "total_duration": 35.5,
  "language": "english",
  "slide_scripts": [
    {
      "slide_number": 1,
      "start_time": 0.0,
      "end_time": 6.0,
      "narration_text": "Welcome! Today we'll explore Newton's Laws of Motion..."
    },
    {
      "slide_number": 2,
      "start_time": 6.0,
      "end_time": 12.5,
      "narration_text": "The first law states that an object at rest stays at rest..."
    }
  ]
}
topic
string
The presentation topic
total_duration
float
Total video duration in seconds (sum of all slide durations)
language
string
Language code for the narration
slide_scripts
array
Array of slide script objects with narration and timing

Data Models

SlideScript

Pydantic model for individual slide narration.
class SlideScript(BaseModel):
    slide_number: int = Field(description="Slide number")
    start_time: float = Field(description="Start time in seconds from beginning")
    end_time: float = Field(description="End time in seconds")
    narration_text: str = Field(description="Voice narration script for this slide")

VideoScript

Pydantic model for complete video script.
class VideoScript(BaseModel):
    topic: str
    total_duration: float
    language: str
    slide_scripts: List[SlideScript]

Narration Guidelines

The script generator follows specific rules for different slide types:

For Animation Slides

Narration must describe what the viewer sees happening in the animation.
# Example for animation slide:
"As you can see on screen, the triangle has sides a, b, and c. 
Watch as we draw squares on each side. Notice that the area of 
the two smaller squares equals the area of the larger square."
Key phrases for animations:
  • “As you can see…”
  • “Watch as…”
  • “Notice how…”
  • “Observe that…”

For Image Slides

Reference the image naturally:
"Looking at this image, we can see Isaac Newton's original diagram. 
This illustration demonstrates the concept of gravitational force."

For Text-Only Slides

Focus on explaining the concept clearly without visual references:
"The second law of motion states that force equals mass times acceleration. 
This fundamental relationship helps us predict how objects will move."

Tone Configuration

From backend/generators/script_generator.py:35-39:
tone_instructions = {
    "formal": "Use formal, academic language. Be precise and technical.",
    "casual": "Use casual, friendly language. Make it conversational and easy to understand.",
    "storytelling": "Use narrative style, build engagement with stories and examples."
}

Usage Example

From backend/app.py:246-249:
# Step 2: Generate narration scripts with timestamps
update_progress(generation_id, 20, "generating_scripts", 
                "📜 Generating voice scripts...")

script_gen = ScriptGenerator()
script_data = script_gen.generate_scripts(content_data, request.language, request.tone)

Timestamp Synchronization

Initial timestamps are estimates. They are corrected after audio generation based on actual audio durations:
# From app.py:288-299
# Update script timestamps based on actual audio durations
current_time = 0
for slide_script in script_data['slide_scripts']:
    slide_num = slide_script['slide_number']
    actual_duration = actual_durations.get(slide_num, 
                                          slide_script['end_time'] - slide_script['start_time'])
    
    slide_script['start_time'] = current_time
    slide_script['end_time'] = current_time + actual_duration
    current_time += actual_duration

script_data['total_duration'] = current_time

Prompt Engineering

The generator builds a detailed prompt including:
  1. Topic and settings: Language, tone, and instructions
  2. Slide information: Title, content, duration, and visual flags
  3. Narration requirements: Pacing (~150 words/minute), natural speech
  4. Animation-specific instructions: Describe visual elements step-by-step
  5. Timing guidance: Sequential timestamps matching slide durations
Example slide info passed to AI:
slides_info = "\n".join([
    f"Slide {slide['slide_number']}: {slide['title']}\n"
    f"  Content: {slide['content_text']}\n"
    f"  Duration: {slide['duration']}s\n"
    f"  Has Animation: {slide['needs_animation']}\n"
    f"  Animation Description: {slide.get('animation_description', 'N/A')}\n"
    f"  Has Image: {slide['needs_image']}\n"
    for slide in content_data['slides']
])

File Persistence

Generated scripts are automatically saved to:
Config.SCRIPTS_DIR / "{topic_sanitized}_script.json"

Response Processing

The generator cleans markdown formatting from AI responses:
response = self.model.generate_content(prompt)
text = response.text.strip()

# Remove markdown code blocks
if text.startswith('```json'):
    text = text[7:]
if text.startswith('```'):
    text = text[3:]
if text.endswith('```'):
    text = text[:-3]
text = text.strip()

script_data = json.loads(text)

Build docs developers (and LLMs) love