Skip to main content

Overview

The VideoComposer class is the final step in the video generation pipeline. It combines all slide visuals (text, images, animations), synchronizes them with audio narration, and produces the final MP4 video file.

Class Definition

from utils.video_composer import VideoComposer

composer = VideoComposer()

Constructor

def __init__(self)
Initializes the video composer (no configuration required).

Methods

compose_final_video

Composes the complete presentation video with synchronized audio.
def compose_final_video(content_data: Dict, script_data: Dict,
                        slide_paths: Dict[int, str], audio_path: str) -> str
content_data
Dict
required
Presentation content structure from ContentGenerator
script_data
Dict
required
Script data with timestamps from ScriptGenerator
slide_paths
Dict[int, str]
required
Dictionary mapping slide numbers to their visual paths. Can contain:
  • String: Path to image file (.png) or video file (.mp4)
  • Dict: Composite structure for animation overlays:
    {
      'type': 'animation_composite',
      'base_slide': '/path/to/base_slide.png',
      'animation': '/path/to/animation.mp4'
    }
    
audio_path
string
required
Path to the complete audio narration file (WAV format)
return
string
Absolute path to the generated final video file (MP4)
Returns example:
"/path/to/final/Newtons_Laws_of_Motion_final.mp4"

create_slide_video

Creates a video clip from a single slide.
def create_slide_video(slide_path: str, duration: float) -> VideoFileClip
slide_path
string
required
Path to slide visual (image or video file)
duration
float
required
Duration in seconds for this slide
return
VideoFileClip
MoviePy video clip object with specified duration
Behavior:
  • Images (.png, .jpg): Converted to static video clip
  • Videos (.mp4, .mov, .avi): Trimmed or extended to match duration
  • Missing files: Creates blank dark blue clip as fallback

composite_animation_on_slide

Overlays a Manim animation video onto a slide image.
def composite_animation_on_slide(slide_image_path: str, animation_video_path: str, 
                                 duration: float) -> VideoFileClip
slide_image_path
string
required
Path to base slide image (PNG)
animation_video_path
string
required
Path to Manim animation video (MP4)
duration
float
required
Target duration for the composite clip
return
VideoFileClip
Composite video with animation overlaid on slide
Compositing process:
  1. Load base slide as static image clip
  2. Load animation video
  3. Adjust animation duration:
    • If too short: Loop the animation
    • If too long: Trim to exact duration
  4. Resize animation to 850×700 pixels
  5. Position animation at coordinates (1010, 250) — placeholder area
  6. Composite animation on top of slide

sanitize_filename

Static method to create safe filenames from topic text.
@staticmethod
def sanitize_filename(text: str, max_length: int = 30) -> str
text
string
required
Text to sanitize
max_length
int
default:"30"
Maximum filename length
return
string
Sanitized filename-safe string

Video Composition Pipeline

From backend/utils/video_composer.py:234-320:
def compose_final_video(self, content_data: Dict, script_data: Dict,
                       slide_paths: Dict[int, str], audio_path: str) -> str:
    slide_clips = []
    
    print(f"\n🎬 Starting video composition...")
    print(f"   Total slides: {len(content_data['slides'])}")
    
    # Process each slide
    for i, slide in enumerate(content_data['slides']):
        slide_num = slide['slide_number']
        
        # Get script timing for this slide
        slide_script = next(
            (s for s in script_data['slide_scripts'] if s['slide_number'] == slide_num),
            None
        )
        
        if not slide_script:
            print(f"⚠️ Warning: No script found for slide {slide_num}")
            continue
        
        # Calculate slide duration from script timestamps
        duration = slide_script['end_time'] - slide_script['start_time']
        print(f"   Processing slide {slide_num}: {duration:.1f}s")
        
        slide_data = slide_paths.get(slide_num)
        
        if not slide_data:
            print(f"⚠️ Warning: No slide visual found for slide {slide_num}")
            continue
        
        # Handle animation composites vs. regular slides
        if isinstance(slide_data, dict) and slide_data.get('type') == 'animation_composite':
            print(f"   🎬 Compositing animation for slide {slide_num}...")
            slide_clip = self.composite_animation_on_slide(
                slide_data['base_slide'],
                slide_data['animation'],
                duration
            )
        else:
            slide_clip = self.create_slide_video(slide_data, duration)
        
        slide_clips.append(slide_clip)
    
    if not slide_clips:
        raise ValueError("No slide clips were created")
    
    # Concatenate all slides
    print(f"\n🔗 Concatenating {len(slide_clips)} slide clips...")
    final_video = concatenate_videoclips(slide_clips, method="compose")
    print(f"   Total video duration: {final_video.duration:.1f}s")
    
    # Add audio track
    if audio_path and Path(audio_path).exists():
        print(f"🎵 Adding audio track...")
        audio = AudioFileClip(audio_path)
        print(f"   Audio duration: {audio.duration:.1f}s")
        
        # Warn if durations don't match
        if abs(final_video.duration - audio.duration) > 0.5:
            print(f"⚠️ Warning: Video duration ({final_video.duration:.1f}s) "
                  f"doesn't match audio ({audio.duration:.1f}s)")
        
        final_video = final_video.with_audio(audio)
    
    # Prepare output path
    topic_name = self.sanitize_filename(content_data['topic'], max_length=30)
    output_path = Config.FINAL_DIR / f"{topic_name}_final.mp4"
    
    # Write final video
    print(f"\n📹 Writing final video to: {output_path}")
    print(f"   Resolution: 1920x1080")
    print(f"   FPS: {Config.MANIM_FPS}")
    print(f"   Codec: libx264 + aac")
    
    final_video.write_videofile(
        str(output_path),
        fps=Config.MANIM_FPS,
        codec='libx264',
        audio_codec='aac',
        preset='medium',
        bitrate='5000k',
        audio_bitrate='192k'
    )
    
    # Cleanup
    print(f"🧹 Cleaning up video clips...")
    for clip in slide_clips:
        clip.close()
    final_video.close()
    if audio_path and Path(audio_path).exists():
        audio.close()
    
    print(f"✅ Final video saved: {output_path}")
    return str(output_path)

Animation Compositing Details

From backend/utils/video_composer.py:322-373:
def composite_animation_on_slide(self, slide_image_path: str, animation_video_path: str, 
                                 duration: float) -> VideoFileClip:
    print(f"      🎬 Compositing animation onto slide...")
    print(f"         Slide: {Path(slide_image_path).name}")
    print(f"         Animation: {Path(animation_video_path).name}")
    print(f"         Duration: {duration:.1f}s")
    
    # Load base slide image
    slide_clip = ImageClip(slide_image_path, duration=duration)
    
    # Load animation video
    animation_clip = VideoFileClip(animation_video_path)
    original_duration = animation_clip.duration
    print(f"         Original animation duration: {original_duration:.1f}s")
    
    # STEP 1: Handle duration adjustment first (WITHOUT position or resize)
    if original_duration < duration:
        print(f"         ⟳ Looping animation to match slide duration")
        # Calculate how many loops needed
        num_loops = int(duration / original_duration) + 1
        
        # Create list of clips to loop
        looped_clips = [animation_clip] * num_loops
        
        # Concatenate and trim to exact duration
        animation_adjusted = concatenate_videoclips(looped_clips, method="compose")
        animation_adjusted = animation_adjusted.subclipped(0, duration)
        
    elif original_duration > duration:
        print(f"         ✂️ Trimming animation to match slide duration")
        animation_adjusted = animation_clip.subclipped(0, duration)
        
    else:
        print(f"         ✅ Animation duration matches slide duration")
        animation_adjusted = animation_clip.with_duration(duration)
    
    # STEP 2: Now apply resize and position as the FINAL operations
    animation_final = animation_adjusted.resized(new_size=(850, 700))
    animation_final = animation_final.with_position((1010, 250))
    
    print(f"         ✅ Animation positioned at (1010, 250) with size 850x700")
    print(f"         Final animation duration: {animation_final.duration:.1f}s")
    
    # STEP 3: Composite animation on top of slide
    composite = CompositeVideoClip(
        [slide_clip, animation_final],
        size=(1920, 1080)
    )
    
    return composite
Critical ordering: Duration adjustments must be applied BEFORE position/resize operations to prevent position loss during video processing.

Usage Example

From backend/app.py:440-451:
# Step 5: Compose final video
update_progress(generation_id, 85, "composing_video", 
                "🎞️ Composing final video with audio...")

composer = VideoComposer()
final_video_path = composer.compose_final_video(
    content_data,
    script_data,
    slide_paths,
    audio_path
)

update_progress(generation_id, 95, "composing_video", "✅ Video composition complete")

Video Output Specifications

Resolution
string
1920×1080 (Full HD)
Frame Rate
int
Config.MANIM_FPS (typically 30 fps)
Video Codec
string
H.264 (libx264)
Video Bitrate
string
5000k (5 Mbps)
Audio Codec
string
AAC
Audio Bitrate
string
192k
Encoding Preset
string
medium (balances speed and quality)

Slide Types Handling

Text-Only Slides

slide_paths[slide_num] = "/path/to/text_slide.png"
Static image loaded as video clip with specified duration.

Image Slides

slide_paths[slide_num] = "/path/to/slide_with_image.png"
Composite image (text + fetched image) loaded as video clip.

Animation Slides

slide_paths[slide_num] = {
    'type': 'animation_composite',
    'base_slide': '/path/to/base_slide.png',
    'animation': '/path/to/manim_animation.mp4'
}
Base slide + animation video composited together with precise positioning.

Timestamp Synchronization

Slide durations are determined by script timestamps:
slide_script = next(
    (s for s in script_data['slide_scripts'] if s['slide_number'] == slide_num),
    None
)

duration = slide_script['end_time'] - slide_script['start_time']
This ensures perfect audio-visual synchronization.

Fallback Behavior

Missing Slide Path

if not slide_path or not Path(slide_path).exists():
    print(f"⚠️ Slide path not found, creating blank slide")
    return ColorClip(size=(1920, 1080), color=(20, 20, 40), duration=duration)
Creates a dark blue blank slide as placeholder.

Missing Audio

Video is generated without audio track (silent video).

Duration Mismatch

if abs(final_video.duration - audio.duration) > 0.5:
    print(f"⚠️ Warning: Video duration ({final_video.duration:.1f}s) "
          f"doesn't match audio ({audio.duration:.1f}s)")
Warning logged but composition continues.

File Output

Final video is saved to:
Config.FINAL_DIR / "{topic_sanitized}_final.mp4"
Example: Newtons_Laws_of_Motion_final.mp4

Memory Management

All video clips are properly closed after use:
for clip in slide_clips:
    clip.close()
final_video.close()
if audio_path and Path(audio_path).exists():
    audio.close()
Prevents memory leaks during batch processing.

Build docs developers (and LLMs) love