Voice Synthesis

MoneyPrinter uses TikTok’s text-to-speech API to generate natural-sounding voiceovers in multiple languages and voices. The TTS system supports long texts by automatically splitting and threading requests.

Core Function

The main TTS function is tts() in Backend/tiktokvoice.py:

def tts(
    text: str,
    voice: str = "none",
    filename: str = "output.mp3",
    play_sound: bool = False,
) -> None:
    """
    Creates a text-to-speech audio file using TikTok's voice API.
    
    Args:
        text (str): The text to convert to speech.
        voice (str): The voice ID to use.
        filename (str): Output audio file path.
        play_sound (bool): Whether to play the audio after generation.
    """

Location: Backend/tiktokvoice.py:121-207

How It Works

Service Availability Check

Verifies TikTok TTS service is reachable before processing.

Text Splitting

Splits text into 300-character chunks if it exceeds the API limit.

Threaded Generation

Uses threading to generate audio for multiple chunks in parallel.

Audio Assembly

Concatenates base64-encoded audio chunks into a single file.

Available Voices

MoneyPrinter supports 40+ voices across multiple languages:

Disney Voices

"en_us_ghostface"    # Ghost Face
"en_us_chewbacca"    # Chewbacca
"en_us_c3po"         # C3PO
"en_us_stitch"       # Stitch
"en_us_stormtrooper" # Stormtrooper
"en_us_rocket"       # Rocket

English Voices

"en_au_001"  # English AU - Female
"en_au_002"  # English AU - Male
"en_uk_001"  # English UK - Male 1
"en_uk_003"  # English UK - Male 2
"en_us_001"  # English US - Female (Int. 1)
"en_us_002"  # English US - Female (Int. 2)
"en_us_006"  # English US - Male 1
"en_us_007"  # English US - Male 2
"en_us_009"  # English US - Male 3
"en_us_010"  # English US - Male 4

European Voices

"fr_001"  # French - Male 1
"fr_002"  # French - Male 2
"de_001"  # German - Female
"de_002"  # German - Male
"es_002"  # Spanish - Male

American Voices

"es_mx_002"  # Spanish MX - Male
"br_001"     # Portuguese BR - Female 1
"br_003"     # Portuguese BR - Female 2
"br_004"     # Portuguese BR - Female 3
"br_005"     # Portuguese BR - Male

Asian Voices

"id_001"  # Indonesian - Female
"jp_001"  # Japanese - Female 1
"jp_003"  # Japanese - Female 2
"jp_005"  # Japanese - Female 3
"jp_006"  # Japanese - Male
"kr_002"  # Korean - Male 1
"kr_003"  # Korean - Female
"kr_004"  # Korean - Male 2

Singing Voices

"en_female_f08_salut_damour"  # Alto
"en_male_m03_lobby"            # Tenor
"en_female_f08_warmy_breeze"   # Warmy Breeze
"en_male_m03_sunshine_soon"    # Sunshine Soon

Special Voices

"en_male_narration"      # Narrator
"en_male_funny"          # Wacky
"en_female_emotional"    # Peaceful

Complete list: Backend/tiktokvoice.py:18-67

For YouTube Shorts, en_us_001 (Female) and en_us_006 (Male 1) are the most popular and recognizable voices.

API Endpoints

MoneyPrinter uses two fallback endpoints:

ENDPOINTS = [
    "https://tiktok-tts.weilnet.workers.dev/api/generation",
    "https://tiktoktts.com/api/tiktok-tts",
]

If the first endpoint is unavailable, it automatically switches to the second.

Text Length Limits

TikTok’s API has a 300-character limit per request:

TEXT_BYTE_LIMIT = 300

Automatic Text Splitting

For longer texts, MoneyPrinter splits by word boundaries:

def split_string(string: str, chunk_size: int) -> List[str]:
    """Split a string into chunks of maximum chunk_size, 
    breaking at word boundaries."""
    words = string.split()
    result = []
    current_chunk = ""
    for word in words:
        if len(current_chunk) + len(word) + 1 <= chunk_size:
            current_chunk += f" {word}"
        else:
            if current_chunk:
                result.append(current_chunk.strip())
            current_chunk = word
    if current_chunk:
        result.append(current_chunk.strip())
    return result

Location: Backend/tiktokvoice.py:79-94

Text is split at word boundaries, not character boundaries, ensuring words aren’t cut off mid-way.

Threading for Long Texts

For texts exceeding 300 characters, multiple API requests run in parallel:

# Split longer text into smaller parts
text_parts = split_string(text, 299)
audio_base64_data = [None] * len(text_parts)

# Define a thread function to generate audio for each text part
def generate_audio_thread(text_part, index):
    audio = generate_audio(text_part, voice)
    # Parse base64 from response
    audio_base64_data[index] = base64_data

threads = []
for index, text_part in enumerate(text_parts):
    thread = threading.Thread(
        target=generate_audio_thread, args=(text_part, index)
    )
    thread.start()
    threads.append(thread)

# Wait for all threads to complete
for thread in threads:
    thread.join()

# Concatenate the base64 data in the correct order
audio_base64_data = "".join(audio_base64_data)

Location: Backend/tiktokvoice.py:167-199

Threading speeds up generation for long texts, but may hit rate limits if too many requests are sent simultaneously. The current implementation doesn’t throttle requests.

Audio Generation

The generate_audio() function sends the actual API request:

def generate_audio(text: str, voice: str) -> bytes:
    """Send POST request to get the audio data."""
    url = f"{ENDPOINTS[current_endpoint]}"
    headers = {"Content-Type": "application/json"}
    data = {"text": text, "voice": voice}
    response = requests.post(url, headers=headers, json=data)
    return response.content

Location: Backend/tiktokvoice.py:112-117

Response Format

The API returns base64-encoded audio:

{
  "data": "data:audio/mpeg;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2Z..."
}

MoneyPrinter extracts the base64 data:

if current_endpoint == 0:
    audio_base64_data = str(audio).split('"')[5]
else:
    audio_base64_data = str(audio).split('"')[3].split(",")[1]

Saving Audio Files

def save_audio_file(base64_data: str, filename: str = "output.mp3") -> None:
    """Save base64-encoded audio to an MP3 file."""
    audio_bytes = base64.b64decode(base64_data)
    with open(filename, "wb") as file:
        file.write(audio_bytes)

Location: Backend/tiktokvoice.py:105-108

Usage Example

from Backend.tiktokvoice import tts, VOICES

# Generate TTS audio
tts(
    text="Welcome to this video about space exploration. Today we'll learn about the Mars rover mission.",
    voice="en_us_001",
    filename="voiceover.mp3",
    play_sound=False
)

print("Audio generated: voiceover.mp3")

# List all available voices
print(f"Available voices: {len(VOICES)}")
for voice in VOICES:
    print(f"  - {voice}")

Error Handling

The TTS function validates inputs and handles API failures:

# Check service availability
if get_api_response().status_code == 200:
    log("[+] TikTok TTS Service available!", "success")
else:
    # Try fallback endpoint
    current_endpoint = (current_endpoint + 1) % 2
    if get_api_response().status_code == 200:
        log("[+] TTS Service available!", "success")
    else:
        log("[-] TTS Service not available and probably temporarily rate limited", "error")
        return

# Validate voice
if voice not in VOICES:
    log("[-] Voice not available", "error")
    return

# Validate text
if not text:
    log("[-] Please specify a text", "error")
    return

Common Errors

Voice Not Available: Invalid voice ID provided
Service Unavailable: TikTok API is down or rate-limited
Empty Text: No text provided for synthesis
Rate Limiting: Too many requests in a short time period

TikTok’s TTS service is unofficial and may have rate limits or occasional downtime. For production use, consider implementing retry logic or alternative TTS providers.

Service Availability Check

def get_api_response() -> requests.Response:
    """Check if the TTS service is available."""
    url = f'{ENDPOINTS[current_endpoint].split("/a")[0]}'
    response = requests.get(url)
    return response

Location: Backend/tiktokvoice.py:98-101

Integration with Pipeline

The TTS audio is integrated into the final video:

Generate TTS: tts() creates MP3 from script
Generate Subtitles: generate_subtitles() syncs text to audio
Add to Video: generate_video() combines audio with visuals

See Video Composition for how the audio is added to the final video.

Performance Considerations

Text Length: Texts >300 chars use threading (faster but more API calls)
Network Latency: API response time typically 1-3 seconds per chunk
Rate Limits: Unknown official limits, but failures occur with rapid requests
File Size: MP3 files are ~1MB per minute of audio

For very long scripts, consider pre-splitting text at sentence boundaries to improve audio quality and reduce threading overhead.

Components

Core Function

How It Works

Available Voices

Disney Voices

English Voices

European Voices

American Voices

Asian Voices

Singing Voices

Special Voices

API Endpoints

Text Length Limits

Automatic Text Splitting

Threading for Long Texts

Audio Generation

Response Format

Saving Audio Files

Usage Example

Error Handling

Common Errors

Service Availability Check

Integration with Pipeline

Performance Considerations

Build docs developers (and LLMs) love

Components

​Core Function

​How It Works

​Available Voices

​Disney Voices

​English Voices

​European Voices

​American Voices

​Asian Voices

​Singing Voices

​Special Voices

​API Endpoints

​Text Length Limits

​Automatic Text Splitting

​Threading for Long Texts

​Audio Generation

​Response Format

​Saving Audio Files

​Usage Example

​Error Handling

​Common Errors

​Service Availability Check

​Integration with Pipeline

​Performance Considerations

Build docs developers (and LLMs) love

Core Function

How It Works

Available Voices

Disney Voices

English Voices

European Voices

American Voices

Asian Voices

Singing Voices

Special Voices

API Endpoints

Text Length Limits

Automatic Text Splitting

Threading for Long Texts

Audio Generation

Response Format

Saving Audio Files

Usage Example

Error Handling

Common Errors

Service Availability Check

Integration with Pipeline

Performance Considerations