ScriptGenerator

Overview

The ScriptGenerator class creates natural-sounding voice narration scripts for each slide. It takes the content structure from ContentGenerator and produces timed narration text that matches the specified language and tone.

Class Definition

from generators.script_generator import ScriptGenerator

script_gen = ScriptGenerator()

Constructor

def __init__(self)

Initializes the script generator with Google Gemini AI configuration. Configuration:

Model: Uses Config.GEMINI_MODEL
Response Format: JSON (application/json MIME type)
API Key: Configured via Config.GEMINI_API_KEY

Methods

generate_scripts

Generates voice narration scripts with timestamps for all slides.

def generate_scripts(content_data: Dict, language: str = "english", 
                     tone: str = "formal") -> Dict

content_data

Dict

required

The presentation content structure from ContentGenerator

language

string

default:"english"

Language for narration. Supported: english, hindi, kannada, telugu, tamil, bengali, gujarati, malayalam, marathi, odia, punjabi

tone

string

default:"formal"

Narration tone style:

formal: Academic, precise, technical language
casual: Friendly, conversational, easy to understand
storytelling: Narrative style with stories and examples

return

Dict

Script data with narration text and timestamps for each slide

Returns structure:

{
  "topic": "Newton's Laws of Motion",
  "total_duration": 35.5,
  "language": "english",
  "slide_scripts": [
    {
      "slide_number": 1,
      "start_time": 0.0,
      "end_time": 6.0,
      "narration_text": "Welcome! Today we'll explore Newton's Laws of Motion..."
    },
    {
      "slide_number": 2,
      "start_time": 6.0,
      "end_time": 12.5,
      "narration_text": "The first law states that an object at rest stays at rest..."
    }
  ]
}

topic

string

The presentation topic

total_duration

float

Total video duration in seconds (sum of all slide durations)

language

string

Language code for the narration

slide_scripts

array

Array of slide script objects with narration and timing

Show Slide Script Properties

slide_number

int

Corresponding slide number

start_time

float

Start time in seconds from video beginning

end_time

float

End time in seconds from video beginning

narration_text

string

Natural spoken narration text for this slide (conversational, ~150 words/minute)

Data Models

SlideScript

Pydantic model for individual slide narration.

class SlideScript(BaseModel):
    slide_number: int = Field(description="Slide number")
    start_time: float = Field(description="Start time in seconds from beginning")
    end_time: float = Field(description="End time in seconds")
    narration_text: str = Field(description="Voice narration script for this slide")

VideoScript

Pydantic model for complete video script.

class VideoScript(BaseModel):
    topic: str
    total_duration: float
    language: str
    slide_scripts: List[SlideScript]

Narration Guidelines

The script generator follows specific rules for different slide types:

For Animation Slides

Narration must describe what the viewer sees happening in the animation.

# Example for animation slide:
"As you can see on screen, the triangle has sides a, b, and c. 
Watch as we draw squares on each side. Notice that the area of 
the two smaller squares equals the area of the larger square."

Key phrases for animations:

“As you can see…”
“Watch as…”
“Notice how…”
“Observe that…”

For Image Slides

Reference the image naturally:

"Looking at this image, we can see Isaac Newton's original diagram. 
This illustration demonstrates the concept of gravitational force."

For Text-Only Slides

Focus on explaining the concept clearly without visual references:

"The second law of motion states that force equals mass times acceleration. 
This fundamental relationship helps us predict how objects will move."

Tone Configuration

From backend/generators/script_generator.py:35-39:

tone_instructions = {
    "formal": "Use formal, academic language. Be precise and technical.",
    "casual": "Use casual, friendly language. Make it conversational and easy to understand.",
    "storytelling": "Use narrative style, build engagement with stories and examples."
}

Usage Example

From backend/app.py:246-249:

# Step 2: Generate narration scripts with timestamps
update_progress(generation_id, 20, "generating_scripts", 
                "📜 Generating voice scripts...")

script_gen = ScriptGenerator()
script_data = script_gen.generate_scripts(content_data, request.language, request.tone)

Timestamp Synchronization

Initial timestamps are estimates. They are corrected after audio generation based on actual audio durations:

# From app.py:288-299
# Update script timestamps based on actual audio durations
current_time = 0
for slide_script in script_data['slide_scripts']:
    slide_num = slide_script['slide_number']
    actual_duration = actual_durations.get(slide_num, 
                                          slide_script['end_time'] - slide_script['start_time'])
    
    slide_script['start_time'] = current_time
    slide_script['end_time'] = current_time + actual_duration
    current_time += actual_duration

script_data['total_duration'] = current_time

Prompt Engineering

The generator builds a detailed prompt including:

Topic and settings: Language, tone, and instructions
Slide information: Title, content, duration, and visual flags
Narration requirements: Pacing (~150 words/minute), natural speech
Animation-specific instructions: Describe visual elements step-by-step
Timing guidance: Sequential timestamps matching slide durations

Example slide info passed to AI:

slides_info = "\n".join([
    f"Slide {slide['slide_number']}: {slide['title']}\n"
    f"  Content: {slide['content_text']}\n"
    f"  Duration: {slide['duration']}s\n"
    f"  Has Animation: {slide['needs_animation']}\n"
    f"  Animation Description: {slide.get('animation_description', 'N/A')}\n"
    f"  Has Image: {slide['needs_image']}\n"
    for slide in content_data['slides']
])

File Persistence

Generated scripts are automatically saved to:

Config.SCRIPTS_DIR / "{topic_sanitized}_script.json"

Response Processing

The generator cleans markdown formatting from AI responses:

response = self.model.generate_content(prompt)
text = response.text.strip()

# Remove markdown code blocks
if text.startswith('```json'):
    text = text[7:]
if text.startswith('```'):
    text = text[3:]
if text.endswith('```'):
    text = text[:-3]
text = text.strip()

script_data = json.loads(text)

ContentGenerator - Provides input content structure
VoiceGenerator - Converts narration text to audio
VideoComposer - Uses timestamps for video synchronization

Endpoints

Backend Components

Overview

Class Definition

Constructor

Methods

generate_scripts

Data Models

SlideScript

VideoScript

Narration Guidelines

For Animation Slides

For Image Slides

For Text-Only Slides

Tone Configuration

Usage Example

Timestamp Synchronization

Prompt Engineering

File Persistence

Response Processing

Build docs developers (and LLMs) love

Endpoints

Backend Components

​Overview

​Class Definition

​Constructor

​Methods

​generate_scripts

​Data Models

​SlideScript

​VideoScript

​Narration Guidelines

​For Animation Slides

​For Image Slides

​For Text-Only Slides

​Tone Configuration

​Usage Example

​Timestamp Synchronization

​Prompt Engineering

​File Persistence

​Response Processing

​Related Components

Build docs developers (and LLMs) love

Overview

Class Definition

Constructor

Methods

generate_scripts

Data Models

SlideScript

VideoScript

Narration Guidelines

For Animation Slides

For Image Slides

For Text-Only Slides

Tone Configuration

Usage Example

Timestamp Synchronization

Prompt Engineering

File Persistence

Response Processing

Related Components