Overview
The backend is a FastAPI application that orchestrates multiple AI services and video processing tools to generate complete video presentations. It follows a modular generator pattern with clear separation of concerns.Tech Stack
FastAPI
Modern Python web framework with async support
MoviePy
Video editing and composition
Manim
Mathematical animation engine
Pillow
Image processing and slide rendering
Application Structure
Main Entry Point: app.py
Location:backend/app.py (538 lines)
Key Components:
- FastAPI Setup
- Request Model
- Response Model
Core Endpoints
| Endpoint | Method | Purpose | Line Reference |
|---|---|---|---|
/api/generate | POST | Start video generation | app.py:223 |
/api/progress/{id} | GET (SSE) | Real-time progress | app.py:129 |
/api/video/{filename} | GET | Stream video with range requests | app.py:489 |
/api/status/{id} | GET | Check generation status | app.py:480 |
/health | GET | Health check | app.py:530 |
Generator Pattern
Each content generation step is encapsulated in a dedicated generator class following the Single Responsibility Principle.1. ContentGenerator
Location:backend/generators/content_generator.py
Purpose: Generate structured presentation content using Google Gemini
Output Structure:
- Uses Gemini with JSON output mode
- Enforces mutual exclusivity: slides have EITHER animation OR image
- Validates slide structure
- Auto-corrects conflicts in metadata
- Saves to
outputs/slides/{topic}_content.json
2. ScriptGenerator
Location:backend/generators/script_generator.py
Purpose: Generate voice narration scripts with timestamps
Input: Content data from ContentGenerator
Output Structure:
- Animation slides: Scripts include visual descriptions (“As you can see…”, “Watch as…”)
- Image slides: Scripts reference images naturally
- Text slides: Focus on conceptual explanation
- Timing adjusted based on actual audio duration
3. VoiceGenerator
Location:backend/generators/voice_generator.py
Purpose: Generate voice narration using Sarvam TTS API
Key Methods:
- English (
bulbul:v1model) - Hindi, Kannada, Telugu (Sarvam multilingual models)
- Format: MP3
- Quality: High (192 kbps)
- Sample Rate: 24kHz
4. ImageFetcher
Location:backend/generators/image_fetcher.py
Purpose: Fetch relevant images from Unsplash API
Usage:
- Searches Unsplash with keyword
- Downloads highest quality available
- Fallback to generic image if search fails
- Respects API rate limits
5. ManimGenerator
Location:backend/generators/manim_generator.py
Purpose: Generate Manim animation code and coordinate rendering
Two-Step Process:
Generated Code Structure:
Utility Modules
SlideRenderer
Location:backend/utils/slide_renderer.py
Purpose: Render PPT-style slides as PNG images using Pillow
Key Methods:
create_text_slide()
create_text_slide()
Creates text-only slide with title and contentOutput: 1920x1080 PNG with:
- Gradient background
- Accent bars (top + left)
- Centered title
- Bullet point content
- Slide number footer
create_slide_with_image()
create_slide_with_image()
Creates slide with image on right sideLayout:
- Left 50%: Title + content text
- Right 50%: Image (scaled to fit)
create_slide_with_animation_placeholder()
create_slide_with_animation_placeholder()
Creates base slide for animation overlayLayout:
- Left 50%: Title + content
- Right 50%: Dark placeholder (animation will be composited here)
- Windows: Arial fonts from
C:/Windows/Fonts/ - Linux: DejaVu fonts
- Fallback: PIL default fonts
outputs/slides/
VideoRenderer
Location:backend/utils/video_renderer.py
Purpose: Render Manim animations to MP4
Method:
- Quality: High (
-qh) - Resolution: 1920x1080
- FPS: 30
- Format: MP4 (H.264)
- Captures stdout/stderr from Manim process
- Throws exception if rendering fails
- Logs full output for debugging
VideoComposer
Location:backend/utils/video_composer.py (374 lines)
Purpose: Compose final video from all slides and audio
Main Method: compose_final_video()
Process:
Composite Animations
If slide has animation, composite MP4 onto base slide PNG:
- Base slide as background (ImageClip)
- Animation resized to 850x700 and positioned at (1010, 250)
- Loop/trim animation to match narration length
File Output Structure
All generated files are organized underbackend/outputs/:
- Topics sanitized: spaces → underscores, special chars removed
- Max length: 30 characters
- Example:
Explain_Newtons_Third_Law_final.mp4
Background Task Processing
The main generation endpoint (/api/generate) runs synchronously but updates progress asynchronously via SSE:
Video Streaming with Range Requests
The/api/video/{filename} endpoint supports HTTP range requests for smooth video seeking:
Implementation (app.py:63-125):
- Enables video seeking in browser
- Reduces initial load time
- Supports resume on connection drop
Configuration Management
Location:backend/config.py
Environment Variables (from .env):
Error Handling
Generation Errors
Generation Errors
Missing Files
Missing Files
API Failures
API Failures
Each generator has fallback logic:
- Unsplash: Use placeholder image
- Manim: Fall back to text slide
- Sarvam: Retry with different voice
Next Steps
Pipeline Flow
See the complete step-by-step generation process
API Reference
Explore all API endpoints in detail
Dependencies
Learn about the generator dependencies