Batch Processing

Overview

The AI YouTube Shorts Generator supports batch processing multiple videos through command-line automation and concurrent execution with unique session IDs.

Each video run generates a unique 8-character session ID for isolated temporary files and output tracking.

Sequential Processing with xargs

Process multiple URLs one after another using xargs.

Create URL List

Create a urls.txt file with one YouTube URL per line:

urls.txt

https://youtu.be/VIDEO_ID_1
https://youtu.be/VIDEO_ID_2
https://youtu.be/VIDEO_ID_3
https://youtu.be/VIDEO_ID_4
https://youtu.be/VIDEO_ID_5

Process with Auto-Approve

Recommended for unattended batch processing:

xargs -a urls.txt -I{} ./run.sh --auto-approve {}

xargs reads URLs

-a urls.txt reads input from the file instead of stdin

Iterates over lines

-I{} replaces {} with each line from the file

Executes run.sh

Runs ./run.sh --auto-approve {URL} for each URL sequentially

Auto-approves selections

The --auto-approve flag skips the 15-second interactive approval prompt

Process with Manual Approval

For reviewing each selection before processing:

xargs -a urls.txt -I{} ./run.sh {}

You’ll see the approval prompt for each video:

============================================================
SELECTED SEGMENT DETAILS:
Time: 68s - 187s (119s duration)
============================================================

Options:
  [Enter/y] Approve and continue
  [r] Regenerate selection
  [n] Cancel

Auto-approving in 15 seconds if no input...

Manual approval requires user input for each video. Consider using auto-approve or the 15-second timeout for hands-off processing.

Auto-Approve Flag

The --auto-approve flag enables fully automated processing.

Implementation

From main.py:16-19:

auto_approve = "--auto-approve" in sys.argv
if auto_approve:
    sys.argv.remove("--auto-approve")

When set, the approval loop is skipped in main.py:103-146:

approved = auto_approve  # Auto-approve if flag is set

if not auto_approve:
    while not approved:
        # Show interactive approval prompt
        # ...
else:
    print(f"\n{'='*60}")
    print(f"SELECTED SEGMENT: {start}s - {stop}s ({stop-start}s duration)")
    print(f"{'='*60}")
    print("Auto-approved (batch mode)\n")

Usage Examples

./run.sh --auto-approve "https://youtu.be/VIDEO_ID"

The --auto-approve flag must appear before the video source argument.

Concurrent Execution

Run multiple videos simultaneously using background processes and unique session IDs.

Session ID Isolation

Each run generates a unique session ID for file isolation:

main.py:12-14

session_id = str(uuid.uuid4())[:8]
print(f"Session ID: {session_id}")

Example session IDs:

3f8a9b12
7c2d4e56
9a1b3c5d

Temporary File Naming

All temporary files include the session ID to prevent conflicts:

main.py:62-66

audio_file = f"audio_{session_id}.wav"
temp_clip = f"temp_clip_{session_id}.mp4"
temp_cropped = f"temp_cropped_{session_id}.mp4"
temp_subtitled = f"temp_subtitled_{session_id}.mp4"

Session A (3f8a9b12)
Session B (7c2d4e56)
Session C (9a1b3c5d)

audio_3f8a9b12.wav
temp_clip_3f8a9b12.mp4
temp_cropped_3f8a9b12.mp4
temp_subtitled_3f8a9b12.mp4

audio_7c2d4e56.wav
temp_clip_7c2d4e56.mp4
temp_cropped_7c2d4e56.mp4
temp_subtitled_7c2d4e56.mp4

audio_9a1b3c5d.wav
temp_clip_9a1b3c5d.mp4
temp_cropped_9a1b3c5d.mp4
temp_subtitled_9a1b3c5d.mp4

Running Videos in Parallel

Launch multiple instances as background jobs:

./run.sh --auto-approve "https://youtu.be/VIDEO1" &
./run.sh --auto-approve "https://youtu.be/VIDEO2" &
./run.sh --auto-approve "https://youtu.be/VIDEO3" &
wait

Launch background processes

The & operator runs each command in the background

Unique session IDs assigned

Each process gets a unique 8-character identifier

Isolated temp files

No file naming conflicts occur between concurrent runs

Wait for completion

The wait command blocks until all background jobs finish

Parallel Processing with xargs

Process multiple videos concurrently using xargs -P:

xargs -a urls.txt -P 3 -I{} ./run.sh --auto-approve {}

Parameters:

-P 3: Run 3 processes in parallel
-a urls.txt: Read URLs from file
-I{}: Placeholder for each URL
--auto-approve: Skip interactive prompts

Determining Optimal Parallelism

Factors to consider:

GPU availability

If using CUDA-accelerated Whisper, multiple processes may compete for GPU memory:

# Check GPU usage
nvidia-smi

Recommendation: 2-3 parallel processes for 8GB GPU

CPU cores

For CPU-only setups:

# Check CPU cores
nproc

Recommendation: Use nproc minus 1-2 cores

Memory usage

Each process uses ~2-4GB RAM during processing:

# Check available memory
free -h

Recommendation: Ensure 4GB+ free per concurrent process

API rate limits

OpenAI API has rate limits that may throttle concurrent requests. Consider:

Free tier: 3 RPM (requests per minute)
Pay-as-you-go: 3,500 RPM
Higher tiers: 10,000+ RPM

Example: 10 Videos with 3 Parallel Jobs

# urls.txt contains 10 URLs
xargs -a urls.txt -P 3 -I{} ./run.sh --auto-approve {}

Execution flow:

Videos 1-3 start immediately
As Video 1 completes, Video 4 starts
As Video 2 completes, Video 5 starts
Process continues until all 10 are done

Time savings:

Sequential: ~50 minutes (5 min/video × 10)
Parallel (3 jobs): ~20 minutes (5 min/video × 10 / 3)

Output File Tracking

Output filenames include session IDs for traceability:

main.py:164-165

clean_title = clean_filename(video_title) if video_title else "output"
final_output = f"{clean_title}_{session_id}_short.mp4"

Example outputs from concurrent runs:

how-to-code-python_3f8a9b12_short.mp4
how-to-code-python_7c2d4e56_short.mp4
how-to-code-python_9a1b3c5d_short.mp4

Even when processing the same video multiple times, each output has a unique filename due to the session ID.

Handling Conflicts

Potential conflict scenarios and solutions:

Shared Downloads Directory

All instances share the videos/ directory for YouTube downloads. If processing the same URL concurrently, downloads may conflict.

Solution: Pre-download videos

# Download all videos first
for url in $(cat urls.txt); do
  youtube-dl -f best "$url" -o "videos/%(title)s.%(ext)s"
done

# Then process local files concurrently
find videos/ -name "*.mp4" | xargs -P 3 -I{} ./run.sh --auto-approve {}

API Rate Limiting

If you hit OpenAI API rate limits:

ERROR IN GetHighlight FUNCTION:
Exception type: RateLimitError
Exception message: Rate limit exceeded

Solutions:

Reduce parallelism

# Instead of -P 5
xargs -a urls.txt -P 2 -I{} ./run.sh --auto-approve {}

Add delays between starts

while IFS= read -r url; do
  ./run.sh --auto-approve "$url" &
  sleep 10  # 10-second delay
done < urls.txt
wait

Upgrade API tier

Contact OpenAI to increase rate limits for your account.

Disk Space

Each video generates temporary files:

audio_{session}.wav           (~50-100MB)
temp_clip_{session}.mp4       (~20-80MB)
temp_cropped_{session}.mp4    (~15-60MB)
temp_subtitled_{session}.mp4  (~15-60MB)
final_{session}_short.mp4     (~15-60MB)

Total per video: ~115-360MB during processing, ~15-60MB after cleanup Monitoring:

# Check disk usage
df -h .

# Watch disk space during batch processing
watch -n 5 'df -h . && ls -lh *.mp4 *.wav 2>/dev/null | wc -l'

Cleanup

Temporary files are automatically removed after each successful run (main.py:173-180):

try:
    for temp_file in [audio_file, temp_clip, temp_cropped, temp_subtitled]:
        if os.path.exists(temp_file):
            os.remove(temp_file)
    print(f"Cleaned up temporary files for session {session_id}")
except Exception as e:
    print(f"Warning: Could not clean up some temporary files: {e}")

If a run is interrupted, manually clean up:

# Remove all temporary files
rm -f audio_*.wav temp_*.mp4

# Keep only final outputs
find . -name "*_short.mp4" -type f

Advanced Batch Patterns

Process Specific Date Range

# Filter URLs by date in filename
grep '2024-01' urls.txt | xargs -I{} ./run.sh --auto-approve {}

Retry Failed Videos

Create a script to track failures:

retry.sh

#!/bin/bash

while IFS= read -r url; do
  echo "Processing: $url"
  ./run.sh --auto-approve "$url"
  
  if [ $? -ne 0 ]; then
    echo "$url" >> failed_urls.txt
    echo "FAILED: $url"
  else
    echo "SUCCESS: $url"
  fi
done < urls.txt

echo "Failed URLs saved to failed_urls.txt"

Reprocess failures:

chmod +x retry.sh
./retry.sh

# Retry failed ones
if [ -f failed_urls.txt ]; then
  xargs -a failed_urls.txt -I{} ./run.sh --auto-approve {}
fi

Process with Custom Priority

# High priority (sequential, immediate attention)
head -3 urls.txt | xargs -I{} ./run.sh {}

# Medium priority (parallel, auto-approve)
tail -n +4 urls.txt | head -10 | xargs -P 3 -I{} ./run.sh --auto-approve {}

# Low priority (background, throttled)
tail -n +14 urls.txt | while read url; do
  ./run.sh --auto-approve "$url" &
  sleep 30
done

Organize Outputs by Session

# Create session-specific directories
mkdir -p output/session_$(date +%Y%m%d_%H%M%S)

# Process and move outputs
xargs -a urls.txt -P 2 -I{} bash -c './run.sh --auto-approve "{}" && mv *_short.mp4 output/session_$(date +%Y%m%d_%H%M%S)/'

For large-scale batch processing (100+ videos), consider using a job queue system like Celery or RQ with Redis for better management and monitoring.

Get Started

User Guides

Features

Advanced

Overview

Sequential Processing with xargs

Create URL List

Process with Auto-Approve

Process with Manual Approval

Auto-Approve Flag

Implementation

Usage Examples

Concurrent Execution

Session ID Isolation

Temporary File Naming

Running Videos in Parallel

Parallel Processing with xargs

Output File Tracking

Handling Conflicts

Shared Downloads Directory

API Rate Limiting

Disk Space

Cleanup

Advanced Batch Patterns

Process Specific Date Range

Retry Failed Videos

Process with Custom Priority

Organize Outputs by Session

Build docs developers (and LLMs) love

Get Started

User Guides

Features

Advanced

​Overview

​Sequential Processing with xargs

​Create URL List

​Process with Auto-Approve

​Process with Manual Approval

​Auto-Approve Flag

​Implementation

​Usage Examples

​Concurrent Execution

​Session ID Isolation

​Temporary File Naming

​Running Videos in Parallel

​Parallel Processing with xargs

​Output File Tracking

​Handling Conflicts

​Shared Downloads Directory

​API Rate Limiting

​Disk Space

​Cleanup

​Advanced Batch Patterns

​Process Specific Date Range

​Retry Failed Videos

​Process with Custom Priority

​Organize Outputs by Session

Build docs developers (and LLMs) love

Overview

Sequential Processing with xargs

Create URL List

Process with Auto-Approve

Process with Manual Approval

Auto-Approve Flag

Implementation

Usage Examples

Concurrent Execution

Session ID Isolation

Temporary File Naming

Running Videos in Parallel

Parallel Processing with xargs

Output File Tracking

Handling Conflicts

Shared Downloads Directory

API Rate Limiting

Disk Space

Cleanup

Advanced Batch Patterns

Process Specific Date Range

Retry Failed Videos

Process with Custom Priority

Organize Outputs by Session