Overview
The AI Video Presentation Generator is a full-stack application that transforms text prompts into complete video presentations with voice narration, visual slides, and animations. The system uses a modern web architecture with React frontend and FastAPI backend, integrated with multiple AI services.High-Level Architecture
Technology Stack
Frontend
- Framework: React 18 with Vite
- Styling: TailwindCSS for modern UI components
- State Management: React Hooks (useState, useEffect)
- Real-time Updates: Server-Sent Events (SSE)
- HTTP Client: Axios
Backend
- Framework: FastAPI (Python)
- API Type: REST with SSE support
- Video Processing: MoviePy, FFmpeg
- Animation: Manim Community Edition
- Image Processing: Pillow (PIL)
- AI Integration: Google Gemini, Sarvam AI
External Services
Google Gemini
Generates structured presentation content and voice narration scripts
Sarvam AI
Multi-language text-to-speech for voice narration (English, Hindi, Kannada, Telugu)
Unsplash API
Fetches relevant images based on content keywords
Manim
Generates mathematical and scientific animations for complex concepts
Directory Structure
Data Flow: Topic → Video
Content Generation
Backend calls Gemini API to generate structured presentation content with slide metadata
Visual Generation
For each slide (mutually exclusive):
- Text-only: Rendered with Pillow (PNG)
- With Image: Unsplash fetch + composite with text
- With Animation: Manim generates code → renders MP4 → composites with base slide
Communication Patterns
REST API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/api/generate | POST | Start video generation |
/api/progress/{id} | GET (SSE) | Real-time progress updates |
/api/video/{filename} | GET | Stream final video |
/api/status/{id} | GET | Check generation status |
/health | GET | Health check |
Server-Sent Events (SSE)
The system uses SSE for real-time progress tracking:Design Principles
Modular Generator Pattern
Modular Generator Pattern
Each content type (scripts, audio, images, animations) has dedicated generator class with single responsibility
Mutual Exclusivity
Mutual Exclusivity
Slides can have EITHER animation OR image, never both - enforced by validation
Real-time Feedback
Real-time Feedback
SSE provides continuous progress updates without polling, improving UX
File-based Storage
File-based Storage
All intermediate outputs saved to disk for debugging, caching, and recovery
Async Processing
Async Processing
Long-running generation happens in background while UI remains responsive
Performance Characteristics
- Generation Time: 2-5 minutes for 5-slide video
- Video Resolution: 1920x1080 (Full HD)
- Frame Rate: 30 FPS
- Audio Quality: 192 kbps AAC
- Video Bitrate: 5000 kbps
Security Considerations
CORS is configured for
localhost:5173 (Vite dev server). Update for production deployment.Next Steps
Frontend Architecture
Dive into React components and state management
Backend Architecture
Explore FastAPI structure and generator pattern
Pipeline Flow
Understand the complete generation pipeline
Installation
Set up your development environment