Skip to main content
This guide covers solutions to common issues you may encounter when using the AI YouTube Shorts Generator.

CUDA and GPU Setup

Issue: CUDA Libraries Not Found

If you see errors like libcudnn.so.8: cannot open shared object file, the NVIDIA CUDA libraries aren’t in your system’s library path.
1

Verify CUDA Installation

Check if CUDA libraries are installed in your virtual environment:
find venv/lib/python3.10/site-packages/nvidia -name "lib" -type d
This should return multiple paths containing CUDA libraries.
2

Set Library Path

Export the library path before running the application:
export LD_LIBRARY_PATH=$(find $(pwd)/venv/lib/python3.10/site-packages/nvidia -name "lib" -type d | paste -sd ":" -)
3

Use the run.sh Script

The provided run.sh script automatically handles this. Always run:
./run.sh "https://youtu.be/VIDEO_ID"
Instead of calling python main.py directly.
The run.sh script is the recommended way to run the application as it properly configures the CUDA environment.

Issue: No GPU Acceleration

If transcription is unexpectedly slow, verify GPU is being used:
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
  • With GPU: CUDA available: True
  • CPU-only: CUDA available: False
If you have an NVIDIA GPU but see False, reinstall PyTorch with CUDA support:
pip uninstall torch
pip install torch --index-url https://download.pytorch.org/whl/cu118

Issue: Wrong CUDA Version

If you get compatibility errors, check your CUDA version:
nvcc --version  # System CUDA version
python -c "import torch; print(torch.version.cuda)"  # PyTorch CUDA version
The system CUDA version and PyTorch CUDA version must be compatible. If they don’t match, reinstall PyTorch with the correct CUDA version from pytorch.org.

ImageMagick Policy Issues

Issue: No Subtitles Appearing on Video

If the video generates successfully but subtitles are missing, this is usually an ImageMagick security policy issue.
1

Check Policy File

Verify the current policy settings:
grep 'pattern="@\*"' /etc/ImageMagick-6/policy.xml
Look for a line containing pattern="@*".
2

Check Current Rights

If the output shows rights="none", ImageMagick is blocking file operations needed for subtitles.
3

Fix the Policy

Run this command to allow read/write operations:
sudo sed -i 's/rights="none" pattern="@\*"/rights="read|write" pattern="@*"/' /etc/ImageMagick-6/policy.xml
4

Verify the Fix

Confirm the change:
grep 'pattern="@\*"' /etc/ImageMagick-6/policy.xml
Should now show: rights="read|write"
This issue is specific to Linux systems. macOS and Windows ImageMagick installations typically don’t have this restriction.

Alternative: Manual Policy Edit

If the command above doesn’t work:
Edit /etc/ImageMagick-6/policy.xml (or /etc/ImageMagick-7/policy.xml):Find:
<policy domain="path" rights="none" pattern="@*"/>
Change to:
<policy domain="path" rights="read|write" pattern="@*"/>

Face Detection Failures

Issue: Face-Centered Crop Not Working

The application prints ✗ No face detected. Using half-width with motion tracking for screen recording even though there are faces in the video.
  1. Faces too small: Default minSize=(30, 30) may miss distant or small faces
  2. Faces not visible in first 30 frames: Detection only samples the beginning
  3. Poor lighting or unusual angles: Affects detection accuracy
  4. Low video resolution: 480p and below have less reliable face detection

Solution: Adjust Detection Parameters

Edit Components/FaceCrop.py at line 40:
Components/FaceCrop.py:40
# More sensitive detection
faces = face_cascade.detectMultiScale(
    gray, 
    scaleFactor=1.05,  # Was 1.1 - slower but more thorough
    minNeighbors=5,     # Was 8 - lower threshold
    minSize=(20, 20)    # Was (30, 30) - detect smaller faces
)
Lower minNeighbors values may cause false positives (detecting non-face objects as faces). Test with your typical video content.

Issue: Wrong Face Selected

If multiple people are in frame and the wrong person is centered: The code selects the largest face (line 43):
best_face = max(faces, key=lambda f: f[2] * f[3])  # Largest face by area
To prioritize the face closest to center instead of the largest face, modify the selection logic to use horizontal position rather than size.

Transcription Issues

Issue: No Transcriptions Found

If you see No transcriptions found after the audio extraction step:
1

Verify Audio Extraction

Check if the audio file was created:
ls -lh audio_*.wav
If the file exists but is very small (less than 100KB), the video may not contain audio.
2

Check Audio in Source

Verify the source video has an audio track:
ffmpeg -i your_video.mp4
Look for an Audio: line in the output. If missing, the video has no audio track.
3

Check for Errors

Look for Whisper-related errors in the console output. Memory errors may indicate insufficient RAM or VRAM.
Whisper requires approximately 1-2GB of VRAM (GPU mode) or 4-8GB RAM (CPU mode) depending on the model size.

Issue: Transcription is Very Slow

Expected speeds:
  • GPU (CUDA): ~5-10 seconds per minute of audio
  • CPU: ~30-60 seconds per minute of audio
If transcription is much slower:
  1. Verify GPU is being used (see CUDA and GPU Setup)
  2. Check GPU memory: nvidia-smi
  3. Close other GPU-intensive applications
  4. Consider using a smaller Whisper model if VRAM is limited

OpenAI API Errors

Issue: Failed to Get Highlight from LLM

If you see ERROR: Failed to get highlight from LLM, this indicates the highlight selection failed.
  1. Invalid API Key: Check your .env file
  2. Rate Limiting: Too many requests to OpenAI API
  3. Network Issues: Connectivity problems
  4. Insufficient Credits: OpenAI account has no remaining credits
  5. Malformed Transcription: Very short videos or corrupted transcription data

Solution: Verify API Configuration

1

Check API Key

Verify your .env file contains a valid key:
cat .env
Should show: OPENAI_API=sk-...
2

Test API Key

Test the key directly:
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"
Should return a list of available models.
3

Check Account Status

Log into platform.openai.com and verify:
  • Account has remaining credits
  • No rate limit warnings
  • API key is active
The .env file must be in the same directory as main.py. If running from a different directory, use an absolute path or copy the .env file.

Issue: Rate Limiting Errors

If processing multiple videos, you may hit OpenAI rate limits:
OpenAI API error: Rate limit exceeded
Solutions:
  1. Add delays between videos in batch processing:
    xargs -a urls.txt -I{} sh -c './run.sh {} && sleep 10'
    
  2. Upgrade to a higher tier OpenAI account
  3. Use a different model (e.g., gpt-3.5-turbo has higher limits)

Concurrent Execution Conflicts

Issue: File Conflicts When Running Multiple Instances

Older versions created files like audio.wav that conflicted. Current version uses session IDs.
As of the latest version, each run gets a unique session ID (8-character UUID). Temporary files are named:
  • audio_{session_id}.wav
  • temp_clip_{session_id}.mp4
  • temp_cropped_{session_id}.mp4
  • temp_subtitled_{session_id}.mp4
This allows multiple instances to run simultaneously without conflicts.

Verification

To verify session ID support:
./run.sh "https://youtu.be/VIDEO_ID"
You should see:
Session ID: a1b2c3d4
At the start of execution. Each concurrent run will have a different ID.

Video Quality Issues

Issue: Blurry or Low-Quality Output

If the final video quality is poor:
  1. Check source resolution: The output quality cannot exceed the input
    ffmpeg -i input_video.mp4
    
  2. Increase bitrate in Components/Subtitles.py and Components/FaceCrop.py:
    bitrate='5000k'  # Was '3000k'
    
  3. Use slower preset for better compression:
    preset='slow'  # Was 'medium'
    
For 1080p source videos, use bitrate='5000k' or higher. For 720p, 3000k is usually sufficient.

Issue: Large Output File Sizes

If output files are too large:
  1. Lower bitrate:
    bitrate='2000k'  # Smaller files
    
  2. Use faster preset (less efficient compression):
    preset='fast'  # Faster encoding, larger files
    
A 2-minute 1080p vertical video typically ranges from 20-50MB depending on bitrate and content complexity.

Getting Additional Help

Collect Debugging Information

When reporting issues, include:
1

Environment Info

python --version
pip list | grep -E "torch|whisper|opencv|moviepy|langchain"
nvidia-smi  # If using GPU
2

Error Output

Copy the full console output, especially error messages and stack traces.
3

Video Details

ffmpeg -i your_video.mp4
Include resolution, duration, and codec information.

Alternative Solutions

Consider using the AI Clipping API which offers:
  • No installation or dependency management
  • Faster processing with optimized infrastructure
  • Better clip selection algorithms
  • Professional support

Community Resources

Build docs developers (and LLMs) love