MoneyPrinter generates accurate subtitles synchronized to the TTS audio. Subtitles can be generated using AssemblyAI’s transcription API or locally based on the script timing.
Core Function
The main subtitle generation function is generate_subtitles() in Backend/video.py:
def generate_subtitles(
audio_path: str,
sentences: List[str],
audio_clips: List[AudioFileClip],
voice: str
) -> str:
"""
Generates subtitles from a given audio file and returns the path to the subtitles.
Args:
audio_path (str): The path to the audio file to generate subtitles from.
sentences (List[str]): All the sentences said out loud in the audio clips.
audio_clips (List[AudioFileClip]): All the individual audio clips which
will make up the final audio track.
voice (str): The voice used for TTS (used for language detection).
Returns:
str: The path to the generated subtitles.
"""
Location: Backend/video.py:118-159
How It Works
Choose Method
If ASSEMBLY_AI_API_KEY is set, use AssemblyAI transcription. Otherwise, generate subtitles locally.
Generate SRT Content
Create SRT-formatted subtitle file with timestamps and text.
Equalize Subtitles
Break long lines into shorter chunks (max 10 chars per line) for better readability.
Save to File
Write the SRT file to the subtitles directory with a unique UUID filename.
Generation Methods
Method 1: AssemblyAI (Recommended)
Uses AI-powered transcription for accurate timing and multi-language support:
def __generate_subtitles_assemblyai(audio_path: str, voice: str) -> str:
"""
Generates subtitles from a given audio file and returns the path to the subtitles.
Args:
audio_path (str): The path to the audio file to generate subtitles from.
voice (str): The voice used (for language code mapping).
Returns:
str: The generated subtitles in SRT format.
"""
language_mapping = {
"br": "pt",
"id": "en", # AssemblyAI doesn't have Indonesian
"jp": "ja",
"kr": "ko",
}
if voice in language_mapping:
lang_code = language_mapping[voice]
else:
lang_code = voice
aai.settings.api_key = ASSEMBLY_AI_API_KEY
config = aai.TranscriptionConfig(language_code=lang_code)
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(audio_path)
subtitles = transcript.export_subtitles_srt()
return subtitles
Location: Backend/video.py:49-78
AssemblyAI provides word-level timing accuracy and supports 30+ languages. Get an API key at assemblyai.com.
Method 2: Local Generation
Generates subtitles based on audio clip durations:
def __generate_subtitles_locally(
sentences: List[str], audio_clips: List[AudioFileClip]
) -> str:
"""
Generates subtitles from sentence list and audio clip durations.
Args:
sentences (List[str]): All the sentences said out loud in the audio clips.
audio_clips (List[AudioFileClip]): All the individual audio clips.
Returns:
str: The generated subtitles in SRT format.
"""
def convert_to_srt_time_format(total_seconds: float) -> str:
# Convert total seconds to the SRT time format: HH:MM:SS,mmm
milliseconds_total = int(round(total_seconds * 1000))
hours, remainder = divmod(milliseconds_total, 3_600_000)
minutes, remainder = divmod(remainder, 60_000)
seconds, milliseconds = divmod(remainder, 1000)
return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"
start_time = 0
subtitles = []
for i, (sentence, audio_clip) in enumerate(zip(sentences, audio_clips), start=1):
duration = audio_clip.duration
end_time = start_time + duration
# Format: subtitle index, start time --> end time, sentence
subtitle_entry = f"{i}\n{convert_to_srt_time_format(start_time)} --> {convert_to_srt_time_format(end_time)}\n{sentence}\n"
subtitles.append(subtitle_entry)
start_time += duration # Update start time for the next subtitle
return "\n".join(subtitles)
Location: Backend/video.py:81-115
Local generation is less accurate than AssemblyAI because it assumes each sentence takes exactly as long as its TTS audio clip, without accounting for pauses or pacing variations.
Subtitles are saved in the standard SRT (SubRip) format:
1
00:00:00,000 --> 00:00:03,500
Welcome to this video about space exploration.
2
00:00:03,500 --> 00:00:07,200
Today we'll learn about the Mars rover mission.
3
00:00:07,200 --> 00:00:11,000
The rover was designed to search for signs of ancient life.
SRT Structure
Each subtitle entry has three parts:
- Index: Sequential number (1, 2, 3…)
- Timestamp: Start time
--> End time in format HH:MM:SS,mmm
- Text: The subtitle text
- Blank Line: Separates entries
Subtitle Equalization
Long lines are broken into shorter chunks for better readability:
def equalize_subtitles(srt_path: str, max_chars: int = 10) -> None:
# Equalize subtitles
srt_equalizer.equalize_srt_file(srt_path, srt_path, max_chars)
Location: Backend/video.py:133-135
This uses the srt_equalizer library to break long lines at word boundaries.
Before Equalization
1
00:00:00,000 --> 00:00:03,500
Welcome to this video about space exploration and Mars missions.
After Equalization (10 chars max)
1
00:00:00,000 --> 00:00:03,500
Welcome to
this video
about space
exploration
and Mars
missions.
The default max_chars=10 works well for vertical mobile videos. For wider videos, increase to 15-20 characters per line.
Language Support
AssemblyAI Language Mapping
Some voice codes are mapped to language codes:
language_mapping = {
"br": "pt", # Brazilian Portuguese → Portuguese
"id": "en", # Indonesian → English (not supported)
"jp": "ja", # Japanese
"kr": "ko", # Korean
}
Supported Languages:
- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Portuguese (pt)
- Japanese (ja)
- Korean (ko)
- And 20+ more languages
AssemblyAI doesn’t support Indonesian transcription, so MoneyPrinter falls back to English for id voices.
Subtitle Positioning
Subtitles are positioned during video generation using subtitles_position:
# Common positions
subtitles_position = "center,bottom" # Bottom center (default)
subtitles_position = "center,center" # Screen center
subtitles_position = "center,top" # Top center (80px from top)
See Video Composition for implementation details.
Subtitle Rendering
Subtitles are rendered with custom styling:
font_path = str((FONTS_DIR / "bold_font.ttf").resolve())
generator = lambda txt: TextClip(
font=font_path,
text=txt,
font_size=100,
color=text_color, # Usually "white"
stroke_color="black",
stroke_width=5,
)
subtitles = SubtitlesClip(subtitles_path, make_textclip=generator)
Location: Backend/video.py:290-306
Styling Options
- Font: Bold font from
fonts/bold_font.ttf
- Size: 100px (scaled for 1080x1920 resolution)
- Color: Configurable via
text_color parameter
- Stroke: 5px black outline for readability
The black stroke ensures subtitles remain readable over any background video content.
File Paths
Subtitle files are saved with UUID filenames:
SUBTITLES_DIR.mkdir(parents=True, exist_ok=True)
subtitles_path = SUBTITLES_DIR / f"{uuid.uuid4()}.srt"
Default location: subtitles/ directory in project root
Usage Example
from Backend.video import generate_subtitles
from moviepy import AudioFileClip
# Prepare data
audio_path = "voiceover.mp3"
sentences = [
"Welcome to this video about space exploration.",
"Today we'll learn about the Mars rover mission.",
"The rover was designed to search for signs of ancient life."
]
# Load audio clips (one per sentence)
audio_clips = [
AudioFileClip("segment1.mp3"),
AudioFileClip("segment2.mp3"),
AudioFileClip("segment3.mp3"),
]
# Generate subtitles
subtitles_path = generate_subtitles(
audio_path=audio_path,
sentences=sentences,
audio_clips=audio_clips,
voice="en_us_001"
)
print(f"Subtitles saved to: {subtitles_path}")
# Read the generated SRT
with open(subtitles_path, "r", encoding="utf-8") as f:
print(f.read())
Error Handling
Missing API Key
If ASSEMBLY_AI_API_KEY is not set, the function falls back to local generation:
if ASSEMBLY_AI_API_KEY is not None and ASSEMBLY_AI_API_KEY != "":
log("[+] Creating subtitles using AssemblyAI", "info")
subtitles = __generate_subtitles_assemblyai(audio_path, voice)
else:
log("[+] Creating subtitles locally", "info")
subtitles = __generate_subtitles_locally(sentences, audio_clips)
Common Errors
- API Key Invalid: AssemblyAI raises authentication error
- Audio File Not Found: Raises
FileNotFoundError
- Empty Sentences: Creates empty subtitle file (no entries)
- Mismatched Lengths: If
len(sentences) != len(audio_clips), local generation will fail
Always ensure sentences and audio_clips have the same length when using local generation. Mismatched lengths will cause zip() to truncate to the shorter list.
The local generator converts seconds to SRT timestamp format:
def convert_to_srt_time_format(total_seconds: float) -> str:
# Convert total seconds to the SRT time format: HH:MM:SS,mmm
milliseconds_total = int(round(total_seconds * 1000))
hours, remainder = divmod(milliseconds_total, 3_600_000)
minutes, remainder = divmod(remainder, 60_000)
seconds, milliseconds = divmod(remainder, 1000)
return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"
Example Conversions
0.0 → 00:00:00,000
3.5 → 00:00:03,500
65.123 → 00:01:05,123
3661.5 → 01:01:01,500
AssemblyAI Configuration
AssemblyAI is configured with the API key from environment variables:
import assemblyai as aai
from dotenv import load_dotenv
load_dotenv(ENV_FILE)
ASSEMBLY_AI_API_KEY = os.getenv("ASSEMBLY_AI_API_KEY")
# In the function:
aai.settings.api_key = ASSEMBLY_AI_API_KEY
config = aai.TranscriptionConfig(language_code=lang_code)
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(audio_path)
Environment Variable:
ASSEMBLY_AI_API_KEY=your_api_key_here
AssemblyAI transcription typically takes 15-30% of the audio duration. A 60-second audio file takes ~10-20 seconds to transcribe.
Integration with Pipeline
Subtitles are generated before final video composition:
- Script Generation → AI creates script
- Voice Synthesis → TTS generates audio from script
- Subtitle Generation → Audio transcribed to SRT ✓
- Video Composition → Subtitles burned into video
See Video Composition for how subtitles are rendered onto the video.
- AssemblyAI: Network latency + transcription time (~15-30% of audio duration)
- Local Generation: Instant (no API calls), but less accurate
- Equalization: Adds ~1-2 seconds for processing
- File I/O: Minimal overhead (SRT files are small, less than 10KB typically)
For production use, AssemblyAI is recommended despite the API cost. The improved accuracy significantly enhances viewer experience.