Skip to main content

Overview

Clip Hand is an AI-powered video editing agent that transforms long-form content into engaging short clips optimized for YouTube Shorts, TikTok, Instagram Reels, and other vertical video platforms. Category: Content
Icon: 🎬

What It Does

1

Download & Analyze

Downloads videos from URLs (YouTube, Vimeo, Twitter, 1000+ sites) or processes local files
2

Transcribe Audio

Converts speech to text using Whisper (local or API), or grabs existing subtitles
3

Identify Viral Moments

Reads the transcript to find segments with hooks, emotional peaks, and self-contained stories
4

Extract & Process

Creates clips, crops to vertical format (9:16), burns captions, and generates thumbnails
5

Publish (Optional)

Automatically posts clips to Telegram channels or WhatsApp

Requirements

Clip Hand requires several external tools to function. Install them before activating the Hand.

FFmpeg

Core video processing engine for extraction, cropping, and caption burning.
# macOS
brew install ffmpeg

# Windows
winget install Gyan.FFmpeg

# Linux (Ubuntu/Debian)
sudo apt install ffmpeg

# Linux (Fedora)
sudo dnf install ffmpeg-free

# Linux (Arch)
sudo pacman -S ffmpeg

yt-dlp

Downloads videos from YouTube, Vimeo, Twitter, and 1000+ other sites.
# macOS
brew install yt-dlp

# Windows
winget install yt-dlp.yt-dlp

# Linux (Ubuntu/Debian)
sudo apt install yt-dlp

# Or via pip (all platforms)
pip install yt-dlp

Speech-to-Text (Choose One)

Option 1: Groq Whisper API (recommended — fast and free tier)
export GROQ_API_KEY=your_key_here
Option 2: OpenAI Whisper API
export OPENAI_API_KEY=your_key_here
Option 3: Local Whisper
pip install openai-whisper
# Or for 4x faster performance:
pip install whisper-ctranslate2

Configuration

Speech-to-Text Settings

SettingOptionsDescription
STT Providerauto, groq_whisper, openai_whisper, deepgram, whisper_localHow audio is transcribed to text
Recommendation: Use groq_whisper for the best balance of speed, quality, and cost (free tier available).

Text-to-Speech (Optional)

Add voice-overs to clips:
SettingOptionsDescription
TTS Providernone, edge_tts, openai_tts, elevenlabsOptional narration generation
  • None — Captions only (default)
  • Edge TTS — Free, no API key required
  • OpenAI TTS — High quality, requires API key
  • ElevenLabs — Premium quality, requires API key

Publishing Settings

SettingOptionsDescription
Publish Targetlocal_only, telegram, whatsapp, bothWhere to send finished clips

Telegram Publishing

Requires:
  • Bot Token from @BotFather
  • Chat ID of your channel or group
# Get your chat ID from @userinfobot
# Channel format: -100XXXXXXXXXX or @channelname
# Make your bot an admin in the channel

WhatsApp Publishing

Requires Meta Business account with WhatsApp Cloud API:
  • Access Token (permanent, not temporary)
  • Phone Number ID from Meta Developer Portal
  • Recipient phone number (international format, no +)
Publishing is optional. By default, Clip Hand saves clips locally without uploading.

Activation

Basic Usage

openfang hand activate clip
Then send the Hand a video URL or file path:
Create clips from this video: https://youtube.com/watch?v=example

Example Workflow

# 1. Activate Clip Hand
openfang hand activate clip

# 2. Configure settings (optional)
openfang hand config clip --set stt_provider=groq_whisper

# 3. Send a task
> Create 3 clips from https://youtube.com/watch?v=dQw4w9WgXcQ

# Clip Hand will:
# - Download the video
# - Transcribe with Groq Whisper
# - Analyze the transcript for viral moments
# - Extract 3 clips
# - Crop to vertical (1080x1920)
# - Burn captions
# - Generate thumbnails
# - Save as clip_1_final.mp4, clip_2_final.mp4, clip_3_final.mp4

How It Works

1. Intake & Download

For URLs:
# Fetches video metadata
yt-dlp --dump-json "URL"

# Downloads best quality up to 1080p
yt-dlp -f "bv[height<=1080]+ba/b[height<=1080]" -o "source.%(ext)s" "URL"

# Attempts to grab existing subtitles (skips transcription if found)
yt-dlp --write-auto-subs --sub-lang en --sub-format json3 --skip-download -o "source" "URL"
For local files:
# Analyzes video metadata
ffprobe -v quiet -print_format json -show_format -show_streams "file.mp4"

2. Transcription

Path A: YouTube Auto-Subs (fastest — no transcription needed)
Parses the json3 subtitle file for word-level timing.
Path B: Groq Whisper API (recommended)
# Extract audio
ffmpeg -i source.mp4 -vn -ar 16000 -ac 1 audio.wav

# Transcribe via Groq API
curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -F "[email protected]" -F "model=whisper-large-v3" \
  -F "response_format=verbose_json" -F "timestamp_granularities[]=word"
Path C: Local Whisper (slowest, but free and private)
whisper audio.wav --model small --output_format json --word_timestamps true

3. Viral Segment Selection

The Hand analyzes the full transcript looking for:
  • Hook in first 3 seconds — Surprising claim, question, emotional statement
  • Self-contained story — Makes sense without the full video
  • Emotional peaks — Laughter, surprise, anger, vulnerability
  • Contrarian takes — Things people want to share or argue about
  • Clean ending — Punchline, conclusion, or dramatic pause
Typical output:
Selected 3 segments:
1. 00:01:23 - 00:02:15 — "The moment I realized I was wrong"
2. 00:05:40 - 00:06:30 — "Why everyone gets this wrong"
3. 00:12:10 - 00:13:00 — "The surprising truth about X"

4. Clip Processing

For each segment: Step 1: Extract
ffmpeg -ss 00:01:23 -to 00:02:15 -i source.mp4 \
  -c:v libx264 -c:a aac -preset fast -crf 23 \
  -movflags +faststart -y clip_1.mp4
Step 2: Crop to Vertical (9:16)
ffmpeg -i clip_1.mp4 \
  -vf "crop=ih*9/16:ih:(iw-ih*9/16)/2:0,scale=1080:1920" \
  -c:a copy -y clip_1_vert.mp4
Step 3: Generate SRT Captions
1
00:00:00,000 --> 00:00:02,500
The moment I realized

2
00:00:02,500 --> 00:00:05,100
I was completely wrong about this
Step 4: Burn Captions
ffmpeg -i clip_1_vert.mp4 \
  -vf "subtitles=clip_1.srt:force_style='FontSize=22,FontName=Arial,PrimaryColour=&H00FFFFFF,OutlineColour=&H00000000,Outline=2,Alignment=2,MarginV=40'" \
  -c:a copy -y clip_1_final.mp4
Step 5: Generate Thumbnail
ffmpeg -i clip_1.mp4 -ss 2 -frames:v 1 -q:v 2 -y thumb_1.jpg

5. Publishing (Optional)

If configured: Telegram:
curl -X POST "https://api.telegram.org/bot$TOKEN/sendVideo" \
  -F "chat_id=$CHAT_ID" \
  -F "video=@clip_1_final.mp4" \
  -F "caption=The moment I realized I was wrong" \
  -F "supports_streaming=true"
WhatsApp:
# Step 1: Upload video
curl -X POST "https://graph.facebook.com/v21.0/$PHONE_ID/media" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@clip_1_final.mp4" \
  -F "messaging_product=whatsapp"

# Step 2: Send message
curl -X POST "https://graph.facebook.com/v21.0/$PHONE_ID/messages" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"messaging_product":"whatsapp","to":"14155551234","type":"video","video":{"id":"MEDIA_ID","caption":"Clip title"}}'

Output

Clip Hand generates:
FileDescription
clip_N_final.mp4Processed clip with captions, ready to post
clip_N.srtSubtitle file (if you want to edit captions)
thumb_N.jpgThumbnail image
transcript.jsonFull transcript with word-level timing

Dashboard Metrics

Track performance in the OpenFang dashboard:
  • Jobs Completed — Total video processing jobs
  • Clips Generated — Number of clips created
  • Total Duration — Cumulative clip length
  • Published to Telegram — Clips sent to Telegram
  • Published to WhatsApp — Clips sent to WhatsApp

Tips & Best Practices

For best results:
  • Use videos with clear speech (podcasts, interviews, lectures work great)
  • Longer source videos (15+ minutes) give more clip options
  • Let the Hand pick segments — its transcript analysis finds better moments than random timestamps
  • Review clips before publishing to channels with large audiences

Common Issues

“yt-dlp not found”
Install yt-dlp: pip install yt-dlp
“FFmpeg not found”
Install FFmpeg for your platform (see Requirements above)
“Transcription failed”
Check your API key is set: echo $GROQ_API_KEY
Or install local Whisper: pip install openai-whisper
“Telegram upload failed: file too large”
Telegram limit is 50MB. The Hand will automatically re-encode larger files.
“WhatsApp upload failed: template required”
Recipient must message your business number first (24-hour window).

Advanced Usage

Custom Timestamps

Override automatic segment selection:
Create clips from video.mp4 at these timestamps:
- 00:01:30 to 00:02:15
- 00:05:00 to 00:05:45

Custom Clip Count

Generate 10 clips from this video: https://youtube.com/watch?v=example

Local Files

Process /path/to/my_video.mp4 and create 5 clips

Next Steps

Twitter Hand

Post your clips to Twitter automatically

Researcher Hand

Research trending topics for clip content