Skip to main content

Overview

The AI Video Presentation Generator is a full-stack application that transforms text prompts into complete video presentations with voice narration, visual slides, and animations. The system uses a modern web architecture with React frontend and FastAPI backend, integrated with multiple AI services.

High-Level Architecture

Technology Stack

Frontend

  • Framework: React 18 with Vite
  • Styling: TailwindCSS for modern UI components
  • State Management: React Hooks (useState, useEffect)
  • Real-time Updates: Server-Sent Events (SSE)
  • HTTP Client: Axios

Backend

  • Framework: FastAPI (Python)
  • API Type: REST with SSE support
  • Video Processing: MoviePy, FFmpeg
  • Animation: Manim Community Edition
  • Image Processing: Pillow (PIL)
  • AI Integration: Google Gemini, Sarvam AI

External Services

Google Gemini

Generates structured presentation content and voice narration scripts

Sarvam AI

Multi-language text-to-speech for voice narration (English, Hindi, Kannada, Telugu)

Unsplash API

Fetches relevant images based on content keywords

Manim

Generates mathematical and scientific animations for complex concepts

Directory Structure

AI-VIDEO-GEN/
├── backend/
│   ├── generators/              # Content generation modules
│   │   ├── content_generator.py # Gemini: PPT structure
│   │   ├── script_generator.py  # Gemini: Voice scripts
│   │   ├── voice_generator.py   # Sarvam: TTS
│   │   ├── image_fetcher.py     # Unsplash integration
│   │   └── manim_generator.py   # Animation code gen
│   │
│   ├── utils/                   # Video processing utilities
│   │   ├── slide_renderer.py    # PIL: Text slides → PNG
│   │   ├── video_renderer.py    # Manim → MP4
│   │   └── video_composer.py    # MoviePy: Final assembly
│   │
│   ├── outputs/                 # Generated files
│   │   ├── audio/              # Voice narration (MP3)
│   │   ├── images/             # Downloaded images
│   │   ├── manim_code/         # Generated Python code
│   │   ├── manim_output/       # Rendered animations
│   │   ├── scripts/            # JSON scripts
│   │   ├── slides/             # Slide images (PNG)
│   │   └── final/              # Final videos (MP4)
│   │
│   ├── app.py                  # FastAPI application
│   ├── config.py               # Configuration
│   └── requirements.txt        # Python dependencies

├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   ├── Home.jsx        # Input form
│   │   │   ├── StepProgress.jsx # Progress tracker
│   │   │   ├── VideoPlayer.jsx # Video playback
│   │   │   ├── SlideEditor.jsx # Slide preview
│   │   │   └── SlidePreview.jsx
│   │   │
│   │   ├── hooks/
│   │   │   └── useSSEProgress.jsx # SSE handler
│   │   │
│   │   ├── utils/
│   │   │   ├── api.js          # API client
│   │   │   └── pptExport.js   # PPT export
│   │   │
│   │   ├── App.jsx             # Main app component
│   │   └── main.jsx            # Entry point
│   │
│   ├── package.json
│   └── vite.config.js

└── README.md

Data Flow: Topic → Video

1

User Input

User enters topic, number of slides, language, and tone in React frontend
2

Content Generation

Backend calls Gemini API to generate structured presentation content with slide metadata
3

Script Generation

Gemini creates voice narration scripts with timing for each slide
4

Audio Generation

Sarvam TTS generates voice audio per slide, actual durations replace estimates
5

Visual Generation

For each slide (mutually exclusive):
  • Text-only: Rendered with Pillow (PNG)
  • With Image: Unsplash fetch + composite with text
  • With Animation: Manim generates code → renders MP4 → composites with base slide
6

Audio Combining

All slide audio files concatenated into single MP3 track
7

Video Composition

MoviePy combines slide visuals with synchronized audio into final MP4
8

Delivery

Video streamed to frontend with range request support for smooth playback

Communication Patterns

REST API Endpoints

EndpointMethodPurpose
/api/generatePOSTStart video generation
/api/progress/{id}GET (SSE)Real-time progress updates
/api/video/{filename}GETStream final video
/api/status/{id}GETCheck generation status
/healthGETHealth check

Server-Sent Events (SSE)

The system uses SSE for real-time progress tracking:
// Frontend hook
const eventSource = new EventSource(
  `http://localhost:8000/api/progress/${generationId}`
);

eventSource.onmessage = (event) => {
  const data = JSON.parse(event.data);
  // Update progress UI: data.progress, data.status, data.message
};
# Backend generator
async def event_generator():
    while not_complete:
        yield f"data: {json.dumps(status_data)}\n\n"
        await asyncio.sleep(0.5)

Design Principles

Each content type (scripts, audio, images, animations) has dedicated generator class with single responsibility
Slides can have EITHER animation OR image, never both - enforced by validation
SSE provides continuous progress updates without polling, improving UX
All intermediate outputs saved to disk for debugging, caching, and recovery
Long-running generation happens in background while UI remains responsive

Performance Characteristics

  • Generation Time: 2-5 minutes for 5-slide video
  • Video Resolution: 1920x1080 (Full HD)
  • Frame Rate: 30 FPS
  • Audio Quality: 192 kbps AAC
  • Video Bitrate: 5000 kbps

Security Considerations

All API keys (Gemini, Sarvam, Unsplash) are stored in .env file and loaded via config.py. Never commit .env to version control.
CORS is configured for localhost:5173 (Vite dev server). Update for production deployment.

Next Steps

Frontend Architecture

Dive into React components and state management

Backend Architecture

Explore FastAPI structure and generator pattern

Pipeline Flow

Understand the complete generation pipeline

Installation

Set up your development environment

Build docs developers (and LLMs) love