Overview
The Content Generation feature uses Google’s Gemini AI to transform your topic into a structured presentation. It analyzes your input and creates a complete slide-by-slide breakdown, deciding what content appears on each slide and whether visuals (animations or images) are needed.How It Works
When you provide a topic, the Content Generator:- Structures the Presentation: Creates a logical flow of slides (typically 5-10 slides) that introduces, explains, and summarizes your topic
- Determines Visual Needs: For each slide, intelligently decides whether it needs:
- Text only (most common - 70-80% of slides)
- An image from Unsplash
- A Manim animation video
- Estimates Timing: Calculates appropriate duration for each slide based on content complexity (typically 4-10 seconds)
- Saves the Structure: Outputs a JSON file containing all slide metadata for downstream processing
The system uses mutual exclusivity - each slide can have either an animation OR an image, never both. Most slides are text-only to maintain focus on the narrative.
Slide Structure
Each generated slide contains:Key Fields
- slide_number: Sequential slide identifier (starts at 1)
- title: Concise slide heading
- content_text: Main narration text (2-4 sentences, optimized for TTS)
- needs_image: Boolean flag for Unsplash image requirement
- image_keyword: Search term for image fetching (only if
needs_image=true) - needs_animation: Boolean flag for Manim animation requirement
- animation_description: Detailed instructions for animation generation (only if
needs_animation=true) - duration: Slide display time in seconds
Animation vs Image Logic
When Images Are Used
Images are selected for:- Historical figures or famous people
- Real-world objects, places, or phenomena
- Static diagrams that support understanding
- Background context or examples
When Animations Are Used
Animations are only created when:- The concept is impossible to understand without motion
- Mathematical proofs require visual demonstration (e.g., Pythagorean theorem)
- Vector operations need directional representation
- Physical motion must be illustrated (e.g., circular motion with velocity vectors)
Text-Only Slides (Most Common)
The majority of slides (70-80%) use text-only format for:- Definitions and explanations
- Lists of concepts or steps
- Summary slides
- Theoretical concepts
- General information
Configuration
API Settings
Content generation is powered by the Gemini API, configured inconfig.py:
Response Format
The generator uses structured JSON output to ensure consistent, parseable responses:Customizing Slide Count
You can adjust the number of slides when generating content:Validation & Error Handling
The system includes robust validation:- Missing fields are automatically added with defaults
- Conflicting flags (both
needs_animationandneeds_imageset to true) are resolved by prioritizing animation - Malformed JSON is cleaned and parsed (removes markdown code blocks)
Output Structure
Generated content is saved to:Best Practices
- Specific Topics Work Best: “The Pythagorean Theorem Proof” generates better content than “Math”
- Educational Focus: The prompt is optimized for learning content - tutorials, concepts, explanations
- Trust the AI: Gemini’s decisions on visual needs are based on educational best practices
- Review the JSON: Check the generated
_content.jsonfile to verify structure before video generation
Related Features
- Voice Narration - Converts
content_textinto audio - Visual Media - Generates animations and fetches images based on slide flags
- Video Composition - Uses duration values for timeline synchronization