Video Composition - AI Video Presentation Generator

Overview

Video Composition is the final stage where all generated assets (slides, animations, images, audio) are assembled into a single, synchronized video file. The system uses MoviePy (a Python wrapper for FFmpeg) to handle timeline management, video concatenation, and audio synchronization.

How It Works

High-Level Process

Load Slide Clips: Each slide becomes a video clip (either image or animation)
Apply Durations: Clips are timed according to narration length
Composite Animations: Manim videos are overlaid onto slide templates
Concatenate Clips: All slides are joined sequentially
Sync Audio: Complete narration audio is attached to video
Export Final Video: Render as MP4 with H.264 video and AAC audio

The composition process ensures perfect audio-video synchronization by using narration timing from the script data to set clip durations.

Timeline Synchronization

Slide Timing Calculation

Each slide’s duration is determined by its narration timing:

for slide in content_data['slides']:
    slide_num = slide['slide_number']
    
    # Find corresponding script entry
    slide_script = next(
        (s for s in script_data['slide_scripts'] 
         if s['slide_number'] == slide_num),
        None
    )
    
    # Calculate duration from timeline
    duration = slide_script['end_time'] - slide_script['start_time']
    print(f"Processing slide {slide_num}: {duration:.1f}s")

Timeline Structure

The script_data contains precise timing information:

{
  "slide_scripts": [
    {
      "slide_number": 1,
      "narration_text": "Welcome to our presentation...",
      "start_time": 0.0,
      "end_time": 5.2,
      "duration": 5.2
    },
    {
      "slide_number": 2,
      "narration_text": "In this section we explore...",
      "start_time": 5.2,
      "end_time": 11.7,
      "duration": 6.5
    }
  ],
  "total_duration": 35.8
}

The system automatically warns if video duration and audio duration differ by more than 0.5 seconds, helping catch synchronization issues early.

Slide Processing

Creating Slide Clips

Each slide is converted to a video clip:

def create_slide_video(self, slide_path: str, duration: float) -> VideoFileClip:
    # Handle missing slides gracefully
    if not slide_path or not Path(slide_path).exists():
        print(f"⚠️ Slide path not found, creating blank slide")
        return ColorClip(size=(1920, 1080), color=(20, 20, 40), duration=duration)
    
    # Video files (animations)
    if slide_path.endswith(('.mp4', '.mov', '.avi')):
        video_clip = VideoFileClip(slide_path)
        
        # Adjust duration to match narration
        if video_clip.duration < duration:
            video_clip = video_clip.with_duration(duration)
        elif video_clip.duration > duration:
            video_clip = video_clip.subclipped(0, duration)
        
        return video_clip
    
    # Image files
    else:
        return ImageClip(slide_path, duration=duration)

Handling Different Media Types

Media Type	Processing
Static Image	Creates ImageClip with specified duration
Animation Video	Loads VideoFileClip, adjusts duration
Animation + Slide	Composites animation onto slide template
Missing Asset	Generates blank colored clip

Animation Compositing

Overlay Process

When a slide has both a base slide image and an animation, they are composited:

def composite_animation_on_slide(self, slide_image_path: str, 
                                 animation_video_path: str, 
                                 duration: float) -> VideoFileClip:
    # Load base slide (static image)
    slide_clip = ImageClip(slide_image_path, duration=duration)
    
    # Load animation video
    animation_clip = VideoFileClip(animation_video_path)
    
    # STEP 1: Adjust animation duration
    if animation_clip.duration < duration:
        # Loop animation to fill slide duration
        num_loops = int(duration / animation_clip.duration) + 1
        looped_clips = [animation_clip] * num_loops
        animation_adjusted = concatenate_videoclips(looped_clips, method="compose")
        animation_adjusted = animation_adjusted.subclipped(0, duration)
    else:
        # Trim animation to slide duration
        animation_adjusted = animation_clip.subclipped(0, duration)
    
    # STEP 2: Resize and position animation
    animation_final = animation_adjusted.resized(new_size=(850, 700))
    animation_final = animation_final.with_position((1010, 250))
    
    # STEP 3: Composite layers
    composite = CompositeVideoClip(
        [slide_clip, animation_final],
        size=(1920, 1080)
    )
    
    return composite

Animation position (1010, 250) and size (850, 700) are hardcoded to match the slide template placeholder. If you customize slide templates, update these values in video_composer.py:362-372.

Animation Duration Handling

Scenario 1: Animation shorter than narration

Action: Loop animation seamlessly
Example: 3s animation, 9s narration → animation plays 3 times

Scenario 2: Animation longer than narration

Action: Trim animation to match narration
Example: 8s animation, 5s narration → first 5s used

Scenario 3: Exact match

Action: Use animation as-is

Concatenation

Joining Slide Clips

All processed slides are concatenated sequentially:

print(f"🔗 Concatenating {len(slide_clips)} slide clips...")
final_video = concatenate_videoclips(slide_clips, method="compose")
print(f"Total video duration: {final_video.duration:.1f}s")

Concatenation method: compose

Ensures consistent resolution across clips
Handles clips of different types (image, video, composite)
Maintains frame rate throughout

The method="compose" parameter ensures all clips are rendered at the same resolution (1920x1080) even if source dimensions vary.

Audio Integration

Attaching Narration

The complete audio track is synced to the video:

if audio_path and Path(audio_path).exists():
    print(f"🎵 Adding audio track...")
    audio = AudioFileClip(audio_path)
    print(f"Audio duration: {audio.duration:.1f}s")
    
    # Warn about synchronization issues
    if abs(final_video.duration - audio.duration) > 0.5:
        print(f"⚠️ Warning: Video duration ({final_video.duration:.1f}s) "
              f"doesn't match audio ({audio.duration:.1f}s)")
    
    final_video = final_video.with_audio(audio)

Audio Sync Validation

Tolerance: ±0.5 seconds
Warning Trigger: Duration mismatch exceeds tolerance
Common Causes:
- Slide durations don’t match audio chunks
- Audio generation had truncation or errors
- Manual edits to slide timing

FFmpeg Export

Final Rendering

The composed video is exported using FFmpeg via MoviePy:

topic_name = self.sanitize_filename(content_data['topic'], max_length=30)
output_path = Config.FINAL_DIR / f"{topic_name}_final.mp4"

print(f"📹 Writing final video to: {output_path}")
print(f"Resolution: 1920x1080")
print(f"FPS: {Config.MANIM_FPS}")
print(f"Codec: libx264 + aac")

final_video.write_videofile(
    str(output_path),
    fps=Config.MANIM_FPS,        # 30 FPS default
    codec='libx264',             # H.264 video codec
    audio_codec='aac',           # AAC audio codec
    preset='medium',             # Encoding speed/quality balance
    bitrate='5000k',             # 5 Mbps video bitrate
    audio_bitrate='192k'         # 192 kbps audio bitrate
)

Export Configuration

Parameter	Value	Purpose
fps	30	Frames per second (matches Manim renders)
codec	libx264	H.264 video compression (widely compatible)
audio_codec	aac	Advanced Audio Coding (industry standard)
preset	medium	Balanced encoding speed vs quality
bitrate	5000k	5 Mbps video quality (HD quality)
audio_bitrate	192k	High-quality audio (near CD quality)

FFmpeg Presets

You can adjust the preset parameter for different use cases:

ultrafast: Fastest encoding, largest file size
fast: Quick encoding, larger file
medium: Balanced (default)
slow: Better compression, smaller file
veryslow: Best compression, takes longest

For faster iteration during development, use preset='fast' and bitrate='2000k'. Switch to preset='medium' or 'slow' for final production videos.

Output Structure

Final Video Path

workspace/source/data/final/<topic_name>_final.mp4

Filename sanitization removes special characters:

Spaces → Underscores
Colons, slashes → Removed
Quotes, question marks → Removed
Max 30 characters

Video Specifications

Container: MP4 (MPEG-4 Part 14)
Video Codec: H.264 (AVC)
Audio Codec: AAC
Resolution: 1920x1080 (Full HD)
Aspect Ratio: 16:9
Frame Rate: 30 FPS
Typical File Size: 50-200 MB for 1-5 minute video

Cleanup

Resource Management

After export, all video clips are properly closed to free memory:

print(f"🧹 Cleaning up video clips...")
for clip in slide_clips:
    clip.close()
final_video.close()
if audio_path and Path(audio_path).exists():
    audio.close()

print(f"✅ Final video saved: {output_path}")

Failure to close clips can cause memory leaks, especially when generating multiple videos in succession. Always close clips after rendering.

Error Handling

Common Issues

No slide clips created:

if not slide_clips:
    raise ValueError("No slide clips were created")

Cause: All slides missing visual assets
Solution: Check that images/animations were generated

Audio-video duration mismatch:

if abs(final_video.duration - audio.duration) > 0.5:
    print(f"⚠️ Warning: Duration mismatch detected")

Cause: Slide timing doesn’t match audio
Solution: Regenerate audio or adjust slide durations

FFmpeg encoding failure:

Cause: Missing FFmpeg installation or codec issues
Solution: Verify FFmpeg with ffmpeg -version

Performance Optimization

Rendering Speed Tips

Use Lower Quality During Testing

bitrate='2000k',  # Instead of 5000k
preset='fast'     # Instead of 'medium'

Process Slides in Parallel (Future Enhancement)
- Current: Sequential processing
- Potential: Parallel clip creation with threading
Cache Intermediate Renders
- Reuse slide clips if content hasn’t changed
- Skip regeneration of unchanged animations

Memory Management

Large Presentations: Close clips immediately after concatenation
Multiple Videos: Run garbage collection between generations
Animation Loops: Use concatenate_videoclips instead of manual looping

For presentations with 10+ slides, monitor RAM usage. MoviePy loads all clips into memory before concatenation.

Customization Options

Output Resolution

Change resolution in config.py:

VIDEO_RESOLUTION = (1920, 1080)  # Full HD (default)
# VIDEO_RESOLUTION = (1280, 720)   # HD
# VIDEO_RESOLUTION = (3840, 2160)  # 4K

Update all references in code:

ColorClip(size=Config.VIDEO_RESOLUTION, ...)
CompositeVideoClip([...], size=Config.VIDEO_RESOLUTION)

Animation Placement

Modify animation position/size in video_composer.py:361-362:

# Default: Right side of slide
animation_final = animation_adjusted.resized(new_size=(850, 700))
animation_final = animation_final.with_position((1010, 250))

# Alternative: Centered
animation_final = animation_adjusted.resized(new_size=(1200, 800))
animation_final = animation_final.with_position(('center', 'center'))

Troubleshooting

Video Export Fails

Symptom: Error during write_videofile() Solutions:

Verify FFmpeg is installed: ffmpeg -version
Check disk space (exports need 2-3x final size temporarily)
Ensure output directory exists and is writable
Try different preset: preset='ultrafast'

Audio Not Playing

Symptom: Video renders but no audio in final MP4 Solutions:

Verify audio file exists and is valid WAV
Check audio codec support: use 'aac' not 'mp3'
Test audio separately: ffplay <audio_path>

Animations Mispositioned

Symptom: Animation appears in wrong location or cut off Solutions:

Verify slide template dimensions (should have placeholder)
Adjust position coordinates in composite_animation_on_slide()
Check animation resolution matches expected size (850x700)

Content Generation - Provides slide structure and timing requirements
Voice Narration - Generates audio track that drives timeline
Visual Media - Creates animation and image assets for composition

Get Started

Core Features

User Guides

Configuration

​Overview

​How It Works

​High-Level Process

​Timeline Synchronization

​Slide Timing Calculation

​Timeline Structure

​Slide Processing

​Creating Slide Clips

​Handling Different Media Types

​Animation Compositing

​Overlay Process

​Animation Duration Handling

​Concatenation

​Joining Slide Clips

​Audio Integration

​Attaching Narration

​Audio Sync Validation

​FFmpeg Export

​Final Rendering

​Export Configuration

​FFmpeg Presets

​Output Structure

​Final Video Path

​Video Specifications

​Cleanup

​Resource Management

​Error Handling

​Common Issues

​Performance Optimization

​Rendering Speed Tips

​Memory Management

​Customization Options

​Output Resolution

​Animation Placement

​Troubleshooting

​Video Export Fails

​Audio Not Playing

​Animations Mispositioned

​Related Features

Build docs developers (and LLMs) love

Overview

How It Works

High-Level Process

Timeline Synchronization

Slide Timing Calculation

Timeline Structure

Slide Processing

Creating Slide Clips

Handling Different Media Types

Animation Compositing

Overlay Process

Animation Duration Handling

Concatenation

Joining Slide Clips

Audio Integration

Attaching Narration

Audio Sync Validation

FFmpeg Export

Final Rendering

Export Configuration

FFmpeg Presets

Output Structure

Final Video Path

Video Specifications

Cleanup

Resource Management

Error Handling

Common Issues

Performance Optimization

Rendering Speed Tips

Memory Management

Customization Options

Output Resolution

Animation Placement

Troubleshooting

Video Export Fails

Audio Not Playing

Animations Mispositioned

Related Features