Prerequisites
Before installing, ensure you have:- Python 3.10 or higher
- OpenAI API key (get one here)
- FFmpeg with development headers
- ImageMagick for subtitle rendering
- NVIDIA GPU with CUDA support (optional but recommended for 5-10x faster transcription)
GPU acceleration requires an NVIDIA GPU with CUDA support. For systems without a compatible GPU, see the CPU-Only Installation section.
Ubuntu/Debian Installation
Install System Dependencies
- ffmpeg: Video/audio processing
- libavdevice-dev, libavfilter-dev: FFmpeg development libraries
- libopus-dev, libvpx-dev: Audio/video codec support
- pkg-config: Build configuration tool
- libsrtp2-dev: Secure RTP protocol support
- imagemagick: Subtitle rendering
Fix ImageMagick Security Policy
ImageMagick has a restrictive security policy by default that prevents subtitle rendering:
Install Python Dependencies
- faster-whisper (1.0.1): GPU-accelerated speech transcription
- torch (2.7.1): PyTorch with CUDA support
- langchain-openai (0.3.0): GPT-4o-mini integration
- moviepy (1.0.3): Video editing and manipulation
- opencv-python (4.8.1.78): Face detection and cropping
- pytubefix (9.1.1): YouTube video downloading
Configure Environment Variables
Create a Replace
.env file in the project root:your_openai_api_key_here with your actual OpenAI API key from platform.openai.com/api-keys.Verify Installation
Test that GPU acceleration is working:Should output:
CUDA available: TrueIf it shows False, you may need to install CUDA drivers or use CPU-only mode.macOS Installation
Install Python Dependencies
macOS does not support CUDA, so transcription will run on CPU. For faster processing, consider using a cloud GPU instance or the AI Clipping API.
Windows Installation
Install ImageMagick
- Open
C:\Program Files\ImageMagick-7.x.x-Q16-HDRI\config\policy.xml - Find:
<policy domain="path" rights="none" pattern="@*"/> - Change to:
<policy domain="path" rights="read|write" pattern="@*"/> - Save the file
CPU-Only Installation
If you don’t have an NVIDIA GPU, you can run the tool in CPU-only mode. Transcription will be significantly slower (5-10x), but all features remain functional.Ubuntu/Debian (CPU)
Install Other Dependencies
If
requirements-cpu.txt doesn’t exist, use requirements.txt but skip CUDA-related packages.Windows (CPU)
Install System Dependencies
macOS (CPU)
macOS installation is CPU-only by default. Follow the standard macOS installation instructions.Docker Installation
Docker provides a containerized environment with all dependencies pre-configured, including GPU support.Prerequisites
- Docker 20.10+ installed (get Docker)
- Docker Compose 1.29+ (get Docker Compose)
- For GPU support: NVIDIA Docker runtime (installation guide)
Using Docker Compose (Recommended)
Build and Run Container
- Base image:
nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 - GPU support: Enabled via NVIDIA runtime
- Mounts:
.envfile,./videos(input),./output(output) - Interactive mode: Enabled for URL input
Manual Docker Build
CPU-Only Docker
Remove GPU-specific configuration:Environment Variables
The tool requires the following environment variable:| Variable | Required | Description | Example |
|---|---|---|---|
OPENAI_API | Yes | OpenAI API key for GPT-4o-mini | sk-proj-... |
.env in project root
Verifying Your Installation
Test your installation with a short video:- Download the video
- Extract and transcribe audio
- Analyze transcript with GPT-4o-mini
- Present highlight selection for approval
- Process and output vertical short
Troubleshooting
CUDA/GPU Issues
Problem:torch.cuda.is_available() returns False
Solutions:
-
Verify NVIDIA drivers are installed:
-
Check CUDA library paths:
The
run.shscript handles this automatically. -
Reinstall PyTorch with CUDA:
ImageMagick Subtitle Issues
Problem: No subtitles appear in output video Solution: Check ImageMagick policy:rights="read|write"
If not:
Face Detection Issues
Problem: Cropping doesn’t center on faces Causes:- Video needs visible faces in first 30 frames
- Low-resolution videos have less reliable detection
- For screen recordings, motion tracking applies automatically
Components/FaceCrop.py:detectMultiScale:
OpenAI API Issues
Problem:ERROR: Failed to get highlight from LLM
Causes:
- Invalid or missing API key
- Rate limiting
- Network connectivity issues
- Insufficient API credits
- Verify API key in
.envfile - Check API usage at platform.openai.com/usage
- Test API key:
FFmpeg Not Found
Problem:ffmpeg: command not found
Solutions:
- Ubuntu/Debian:
sudo apt install ffmpeg - macOS:
brew install ffmpeg - Windows: Add FFmpeg to system PATH
Python Version Issues
Problem: Module compatibility errors Solution: Ensure Python 3.10+ is installed:- Ubuntu:
sudo apt install python3.10 python3.10-venv - macOS:
brew install [email protected] - Windows: Download from python.org
Next Steps
Quickstart Guide
Generate your first short in under 5 minutes
Usage Examples
Learn CLI commands and automation techniques
Configuration
Customize subtitle styling, AI prompts, and video settings
API Reference
Explore the codebase and component architecture
