System Architecture

Overview

The AI Video Presentation Generator is a full-stack application that transforms text prompts into complete video presentations with voice narration, visual slides, and animations. The system uses a modern web architecture with React frontend and FastAPI backend, integrated with multiple AI services.

High-Level Architecture

Technology Stack

Frontend

Framework: React 18 with Vite
Styling: TailwindCSS for modern UI components
State Management: React Hooks (useState, useEffect)
Real-time Updates: Server-Sent Events (SSE)
HTTP Client: Axios

Backend

Framework: FastAPI (Python)
API Type: REST with SSE support
Video Processing: MoviePy, FFmpeg
Animation: Manim Community Edition
Image Processing: Pillow (PIL)
AI Integration: Google Gemini, Sarvam AI

External Services

Google Gemini

Generates structured presentation content and voice narration scripts

Sarvam AI

Multi-language text-to-speech for voice narration (English, Hindi, Kannada, Telugu)

Unsplash API

Fetches relevant images based on content keywords

Manim

Generates mathematical and scientific animations for complex concepts

Directory Structure

AI-VIDEO-GEN/
├── backend/
│   ├── generators/              # Content generation modules
│   │   ├── content_generator.py # Gemini: PPT structure
│   │   ├── script_generator.py  # Gemini: Voice scripts
│   │   ├── voice_generator.py   # Sarvam: TTS
│   │   ├── image_fetcher.py     # Unsplash integration
│   │   └── manim_generator.py   # Animation code gen
│   │
│   ├── utils/                   # Video processing utilities
│   │   ├── slide_renderer.py    # PIL: Text slides → PNG
│   │   ├── video_renderer.py    # Manim → MP4
│   │   └── video_composer.py    # MoviePy: Final assembly
│   │
│   ├── outputs/                 # Generated files
│   │   ├── audio/              # Voice narration (MP3)
│   │   ├── images/             # Downloaded images
│   │   ├── manim_code/         # Generated Python code
│   │   ├── manim_output/       # Rendered animations
│   │   ├── scripts/            # JSON scripts
│   │   ├── slides/             # Slide images (PNG)
│   │   └── final/              # Final videos (MP4)
│   │
│   ├── app.py                  # FastAPI application
│   ├── config.py               # Configuration
│   └── requirements.txt        # Python dependencies
│
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   ├── Home.jsx        # Input form
│   │   │   ├── StepProgress.jsx # Progress tracker
│   │   │   ├── VideoPlayer.jsx # Video playback
│   │   │   ├── SlideEditor.jsx # Slide preview
│   │   │   └── SlidePreview.jsx
│   │   │
│   │   ├── hooks/
│   │   │   └── useSSEProgress.jsx # SSE handler
│   │   │
│   │   ├── utils/
│   │   │   ├── api.js          # API client
│   │   │   └── pptExport.js   # PPT export
│   │   │
│   │   ├── App.jsx             # Main app component
│   │   └── main.jsx            # Entry point
│   │
│   ├── package.json
│   └── vite.config.js
│
└── README.md

Data Flow: Topic → Video

User Input

User enters topic, number of slides, language, and tone in React frontend

Content Generation

Backend calls Gemini API to generate structured presentation content with slide metadata

Script Generation

Gemini creates voice narration scripts with timing for each slide

Audio Generation

Sarvam TTS generates voice audio per slide, actual durations replace estimates

Visual Generation

For each slide (mutually exclusive):

Text-only: Rendered with Pillow (PNG)
With Image: Unsplash fetch + composite with text
With Animation: Manim generates code → renders MP4 → composites with base slide

Audio Combining

All slide audio files concatenated into single MP3 track

Video Composition

MoviePy combines slide visuals with synchronized audio into final MP4

Delivery

Video streamed to frontend with range request support for smooth playback

Communication Patterns

REST API Endpoints

Endpoint	Method	Purpose
`/api/generate`	POST	Start video generation
`/api/progress/{id}`	GET (SSE)	Real-time progress updates
`/api/video/{filename}`	GET	Stream final video
`/api/status/{id}`	GET	Check generation status
`/health`	GET	Health check

Server-Sent Events (SSE)

The system uses SSE for real-time progress tracking:

// Frontend hook
const eventSource = new EventSource(
  `http://localhost:8000/api/progress/${generationId}`
);

eventSource.onmessage = (event) => {
  const data = JSON.parse(event.data);
  // Update progress UI: data.progress, data.status, data.message
};

# Backend generator
async def event_generator():
    while not_complete:
        yield f"data: {json.dumps(status_data)}\n\n"
        await asyncio.sleep(0.5)

Design Principles

Modular Generator Pattern

Each content type (scripts, audio, images, animations) has dedicated generator class with single responsibility

Mutual Exclusivity

Slides can have EITHER animation OR image, never both - enforced by validation

Real-time Feedback

SSE provides continuous progress updates without polling, improving UX

File-based Storage

All intermediate outputs saved to disk for debugging, caching, and recovery

Async Processing

Long-running generation happens in background while UI remains responsive

Performance Characteristics

Generation Time: 2-5 minutes for 5-slide video
Video Resolution: 1920x1080 (Full HD)
Frame Rate: 30 FPS
Audio Quality: 192 kbps AAC
Video Bitrate: 5000 kbps

Security Considerations

All API keys (Gemini, Sarvam, Unsplash) are stored in .env file and loaded via config.py. Never commit .env to version control.

CORS is configured for localhost:5173 (Vite dev server). Update for production deployment.

Next Steps

Frontend Architecture

Dive into React components and state management

Backend Architecture

Explore FastAPI structure and generator pattern

Pipeline Flow

Understand the complete generation pipeline

Installation

Set up your development environment

Architecture

Contributing

Overview

High-Level Architecture

Technology Stack

Frontend

Backend

External Services

Google Gemini

Sarvam AI

Unsplash API

Manim

Directory Structure

Data Flow: Topic → Video

Communication Patterns

REST API Endpoints

Server-Sent Events (SSE)

Design Principles

Performance Characteristics

Security Considerations

Next Steps

Frontend Architecture

Backend Architecture

Pipeline Flow

Installation

Build docs developers (and LLMs) love

Architecture

Contributing

​Overview

​High-Level Architecture

​Technology Stack

​Frontend

​Backend

​External Services

Google Gemini

Sarvam AI

Unsplash API

Manim

​Directory Structure

​Data Flow: Topic → Video

​Communication Patterns

​REST API Endpoints

​Server-Sent Events (SSE)

​Design Principles

​Performance Characteristics

​Security Considerations

​Next Steps

Frontend Architecture

Backend Architecture

Pipeline Flow

Installation

Build docs developers (and LLMs) love

Overview

High-Level Architecture

Technology Stack

Frontend

Backend

External Services

Directory Structure

Data Flow: Topic → Video

Communication Patterns

REST API Endpoints

Server-Sent Events (SSE)

Design Principles

Performance Characteristics

Security Considerations

Next Steps