Skip to main content

Prerequisites

Before you begin, ensure you have the following installed:
  • Python 3.8+ with pip
  • Node.js 16+ with npm
  • FFmpeg (for video processing)
  • Git (to clone the repository)
You’ll also need API keys from:

Get Started in 5 Minutes

1

Clone the repository

git clone https://github.com/Kamal-Nayan-Kumar/AI-Video-Gen
cd AI-Video-Gen
2

Set up the backend

Navigate to the backend directory and create a virtual environment:
cd backend
python -m venv venv
Activate the virtual environment:
venv\Scripts\activate
Install dependencies:
pip install -r requirements.txt
Install Manim (animation library):
pip install manim
3

Configure environment variables

Create a .env file in the backend/ directory:
cp .env.example .env
Edit .env and add your API keys:
GEMINI_API_KEY=your_gemini_api_key_here
SARVAM_API_KEY=your_sarvam_api_key_here
UNSPLASH_ACCESS_KEY=your_unsplash_access_key_here
The backend uses Sarvam AI for text-to-speech. Make sure to get your API key from Sarvam AI.
4

Start the backend server

With your virtual environment still activated, run:
python app.py
The FastAPI backend will start on http://localhost:8000. You should see:
INFO:     Started server process
INFO:     Uvicorn running on http://0.0.0.0:8000
5

Set up the frontend

Open a new terminal and navigate to the frontend directory:
cd frontend
Install dependencies:
npm install
Start the development server:
npm run dev
The React frontend will start on http://localhost:5173.
6

Generate your first presentation

  1. Open your browser and go to http://localhost:5173
  2. Enter a topic (e.g., “Introduction to Machine Learning”)
  3. Set the number of slides (e.g., 5)
  4. Choose a language (English, Hindi, Kannada, or Telugu)
  5. Select a tone (Formal, Casual, or Educational)
  6. Click Generate
The system will:
  • Generate presentation content using Gemini AI
  • Create narration scripts for each slide
  • Generate voice audio using Sarvam AI
  • Fetch relevant images from Unsplash or create animations with Manim
  • Compose everything into a final MP4 video
The generation process takes 2-5 minutes depending on the number of slides and complexity.

What’s Next?

Installation Guide

Detailed installation instructions for all platforms

Configuration

Learn about advanced configuration options

API Reference

Explore the FastAPI endpoints

Creating Presentations

Learn how to create and customize presentations

Troubleshooting

Make sure:
  • Your virtual environment is activated
  • All dependencies are installed: pip install -r requirements.txt
  • FFmpeg is installed and accessible in your PATH
  • Port 8000 is not already in use
  • Verify the backend is running on http://localhost:8000
  • Check browser console for CORS errors
  • Ensure your .env file has the correct API keys
Common issues:
  • Invalid API keys in .env
  • FFmpeg not installed or not in PATH
  • Insufficient disk space in backend/outputs/
  • API rate limits exceeded

Build docs developers (and LLMs) love