Skip to main content

Overview

The backend is a FastAPI application that orchestrates multiple AI services and video processing tools to generate complete video presentations. It follows a modular generator pattern with clear separation of concerns.

Tech Stack

FastAPI

Modern Python web framework with async support

MoviePy

Video editing and composition

Manim

Mathematical animation engine

Pillow

Image processing and slide rendering

Application Structure

Main Entry Point: app.py

Location: backend/app.py (538 lines) Key Components:
app = FastAPI(title="Combined Video PPT Generator")

# CORS for frontend
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:5173"],
    allow_methods=["*"],
    allow_headers=["*"],
)

Core Endpoints

EndpointMethodPurposeLine Reference
/api/generatePOSTStart video generationapp.py:223
/api/progress/{id}GET (SSE)Real-time progressapp.py:129
/api/video/{filename}GETStream video with range requestsapp.py:489
/api/status/{id}GETCheck generation statusapp.py:480
/healthGETHealth checkapp.py:530

Generator Pattern

Each content generation step is encapsulated in a dedicated generator class following the Single Responsibility Principle.

1. ContentGenerator

Location: backend/generators/content_generator.py Purpose: Generate structured presentation content using Google Gemini Output Structure:
{
  "topic": "Newton's Third Law",
  "total_slides": 5,
  "slides": [
    {
      "slide_number": 1,
      "title": "Introduction",
      "content_text": "Newton's Third Law states...",
      "needs_image": false,
      "image_keyword": "",
      "needs_animation": true,
      "animation_description": "Show rocket propulsion with force vectors",
      "duration": 6.0
    }
  ]
}
Key Features:
  • Uses Gemini with JSON output mode
  • Enforces mutual exclusivity: slides have EITHER animation OR image
  • Validates slide structure
  • Auto-corrects conflicts in metadata
  • Saves to outputs/slides/{topic}_content.json
Code Reference: lines 163-258

2. ScriptGenerator

Location: backend/generators/script_generator.py Purpose: Generate voice narration scripts with timestamps Input: Content data from ContentGenerator Output Structure:
{
  "topic": "Newton's Third Law",
  "total_duration": 30.5,
  "language": "english",
  "slide_scripts": [
    {
      "slide_number": 1,
      "start_time": 0.0,
      "end_time": 6.0,
      "narration_text": "Today we'll explore Newton's Third Law..."
    }
  ]
}
Special Handling:
  • Animation slides: Scripts include visual descriptions (“As you can see…”, “Watch as…”)
  • Image slides: Scripts reference images naturally
  • Text slides: Focus on conceptual explanation
  • Timing adjusted based on actual audio duration
Code Reference: lines 1-100+

3. VoiceGenerator

Location: backend/generators/voice_generator.py Purpose: Generate voice narration using Sarvam TTS API Key Methods:
# Generate audio for single slide
audio_path = voice_gen.generate_voice_for_slide(
    narration_text,
    slide_number,
    topic,
    language
)
# Returns: outputs/audio/{topic}_slide_{num}.mp3
Language Support:
  • English (bulbul:v1 model)
  • Hindi, Kannada, Telugu (Sarvam multilingual models)
Audio Format:
  • Format: MP3
  • Quality: High (192 kbps)
  • Sample Rate: 24kHz

4. ImageFetcher

Location: backend/generators/image_fetcher.py Purpose: Fetch relevant images from Unsplash API Usage:
image_fetcher = ImageFetcher()
image_path = image_fetcher.fetch_image(
    keyword="physics force diagram",
    slide_number=2,
    topic="Newton's Laws"
)
# Returns: outputs/images/{topic}_slide_{num}.jpg
Features:
  • Searches Unsplash with keyword
  • Downloads highest quality available
  • Fallback to generic image if search fails
  • Respects API rate limits

5. ManimGenerator

Location: backend/generators/manim_generator.py Purpose: Generate Manim animation code and coordinate rendering Two-Step Process:
1

Code Generation

animation_code = manim_gen.generate_animation_code(
    slide_data,
    duration
)
# Returns Python code string for Manim scene
2

Code Saving

code_path = manim_gen.save_animation_code(
    animation_code,
    slide_number,
    topic
)
# Saves to: outputs/manim_code/{topic}_slide_{num}.py
Generated Code Structure:
from manim import *

class SlideAnimation(Scene):
    def construct(self):
        # Gemini-generated animation logic
        # Example: Draw triangle, add squares, show areas
        pass
Note: Actual rendering is handled by VideoRenderer utility

Utility Modules

SlideRenderer

Location: backend/utils/slide_renderer.py Purpose: Render PPT-style slides as PNG images using Pillow Key Methods:
Creates text-only slide with title and contentOutput: 1920x1080 PNG with:
  • Gradient background
  • Accent bars (top + left)
  • Centered title
  • Bullet point content
  • Slide number footer
Creates slide with image on right sideLayout:
  • Left 50%: Title + content text
  • Right 50%: Image (scaled to fit)
Creates base slide for animation overlayLayout:
  • Left 50%: Title + content
  • Right 50%: Dark placeholder (animation will be composited here)
Font Handling:
  • Windows: Arial fonts from C:/Windows/Fonts/
  • Linux: DejaVu fonts
  • Fallback: PIL default fonts
Output Directory: outputs/slides/

VideoRenderer

Location: backend/utils/video_renderer.py Purpose: Render Manim animations to MP4 Method:
video_path = video_renderer.render_manim_animation(
    code_path,      # Path to .py file
    scene_name      # e.g., "SlideAnimation"
)
# Executes: manim -pql code_path.py SlideAnimation
# Returns: outputs/manim_output/{scene_name}.mp4
Manim Configuration:
  • Quality: High (-qh)
  • Resolution: 1920x1080
  • FPS: 30
  • Format: MP4 (H.264)
Error Handling:
  • Captures stdout/stderr from Manim process
  • Throws exception if rendering fails
  • Logs full output for debugging

VideoComposer

Location: backend/utils/video_composer.py (374 lines) Purpose: Compose final video from all slides and audio Main Method: compose_final_video() Process:
1

Load Slide Visuals

For each slide, load PNG (text/image) or MP4 (animation)
2

Match Durations

Adjust each slide clip to match narration duration from script
3

Composite Animations

If slide has animation, composite MP4 onto base slide PNG:
  • Base slide as background (ImageClip)
  • Animation resized to 850x700 and positioned at (1010, 250)
  • Loop/trim animation to match narration length
4

Concatenate Slides

Use MoviePy’s concatenate_videoclips() to join all slides sequentially
5

Add Audio Track

Attach combined audio file to video
6

Render Final Video

Write to outputs/final/{topic}_final.mp4
  • Codec: H.264 (libx264)
  • Audio: AAC 192 kbps
  • Bitrate: 5000 kbps
Key Code (app.py:267-271):
slide_clip = self.composite_animation_on_slide(
    slide_data['base_slide'],    # PNG background
    slide_data['animation'],     # MP4 animation
    duration                     # Target duration
)
Animation Positioning (video_composer.py:361-362):
animation_final = animation_adjusted.resized(new_size=(850, 700))
animation_final = animation_final.with_position((1010, 250))

File Output Structure

All generated files are organized under backend/outputs/:
outputs/
├── audio/
│   ├── {topic}_slide_1.mp3
│   ├── {topic}_slide_2.mp3
│   └── {topic}_combined.mp3

├── images/
│   └── {topic}_slide_{num}.jpg

├── manim_code/
│   └── {topic}_slide_{num}.py

├── manim_output/
│   └── {scene_name}.mp4

├── slides/
│   ├── {topic}_slide_1.png
│   └── {topic}_slide_2.png

├── scripts/
│   └── {topic}_script.json

└── final/
    └── {topic}_final.mp4
File Naming:
  • Topics sanitized: spaces → underscores, special chars removed
  • Max length: 30 characters
  • Example: Explain_Newtons_Third_Law_final.mp4

Background Task Processing

The main generation endpoint (/api/generate) runs synchronously but updates progress asynchronously via SSE:
@app.post("/api/generate")
async def generate_presentation(request: GenerateRequest):
    # Synchronous processing
    generation_id = sanitize_topic(request.topic)
    
    update_progress(generation_id, 10, "generating_content", "📝 Generating content...")
    content_data = content_gen.generate_content(topic, num_slides)
    
    update_progress(generation_id, 20, "generating_scripts", "📜 Generating scripts...")
    script_data = script_gen.generate_scripts(content_data, language, tone)
    
    # ... continue through all steps
    
    update_progress(generation_id, 100, "completed", "✅ Video ready!")
    return GenerateResponse(...)
Progress Updates (app.py:52-61):
def update_progress(generation_id: str, progress: int, status: str, message: str):
    timestamp = datetime.now().strftime("%H:%M:%S")
    generation_status[generation_id] = {
        "status": status,
        "progress": progress,
        "message": message,
        "timestamp": timestamp
    }
    print(f"[{timestamp}] {message}")
SSE Streaming (app.py:129-196):
@app.get("/api/progress/{generation_id}")
async def get_progress(generation_id: str):
    async def event_generator():
        while retry_count < max_retries:
            if generation_id in generation_status:
                status_data = generation_status[generation_id]
                yield f"data: {json.dumps(status_data)}\n\n"
                
                if status_data["status"] in ["completed", "error"]:
                    break
            
            await asyncio.sleep(0.5)
    
    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream",
        headers={"Cache-Control": "no-cache", ...}
    )

Video Streaming with Range Requests

The /api/video/{filename} endpoint supports HTTP range requests for smooth video seeking: Implementation (app.py:63-125):
def _get_range_header(range_header: str, file_size: int) -> tuple[int, int]:
    # Parse "bytes=0-1023" → (0, 1023)
    h = range_header.replace("bytes=", "").split("-")
    start = int(h[0]) if h[0] else 0
    end = int(h[1]) if h[1] else file_size - 1
    return start, end
Benefits:
  • Enables video seeking in browser
  • Reduces initial load time
  • Supports resume on connection drop

Configuration Management

Location: backend/config.py Environment Variables (from .env):
class Config:
    # API Keys
    GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
    SARVAM_API_KEY = os.getenv("SARVAM_API_KEY")
    UNSPLASH_ACCESS_KEY = os.getenv("UNSPLASH_ACCESS_KEY")
    
    # API URLs
    SARVAM_TTS_URL = os.getenv("SARVAM_TTS_URL")
    
    # Directories
    BASE_DIR = Path(__file__).parent
    OUTPUTS_DIR = BASE_DIR / "outputs"
    AUDIO_DIR = OUTPUTS_DIR / "audio"
    IMAGES_DIR = OUTPUTS_DIR / "images"
    # ... etc
    
    # Rendering Settings
    MANIM_FPS = 30
    MANIM_QUALITY = "high"

Error Handling

try:
    content_data = content_gen.generate_content(topic, num_slides)
except Exception as e:
    update_progress(generation_id, 0, "error", f"❌ {str(e)}")
    raise HTTPException(status_code=500, detail=str(e))
if not Path(slide_path).exists():
    print(f"⚠️ Slide not found: {slide_path}")
    return ColorClip(size=(1920, 1080), color=(20, 20, 40), duration=duration)
Each generator has fallback logic:
  • Unsplash: Use placeholder image
  • Manim: Fall back to text slide
  • Sarvam: Retry with different voice

Next Steps

Pipeline Flow

See the complete step-by-step generation process

API Reference

Explore all API endpoints in detail

Dependencies

Learn about the generator dependencies

Build docs developers (and LLMs) love