What It Does
Clip Hand is an AI-powered shorts factory that takes any video URL or file and transforms it into 3-5 viral short clips (30-90 seconds each) with burned-in captions, vertical formatting (9:16 for TikTok/Reels/Shorts), thumbnails, and optional AI voice-over. This is an 8-phase pipeline: Download → Transcribe → Analyze Content → Pick Viral Segments → Extract & Crop → Add Captions → Generate Thumbnails → Optionally Publish to Telegram/WhatsApp.Key Features
- Content-based clipping: Reads transcript to pick segments based on hooks, emotional peaks, and insight density—not just visual scene changes
- 5 STT backends: YouTube auto-subs, Groq Whisper (fast/free), OpenAI Whisper, Deepgram Nova-2, local Whisper
- Vertical formatting: Auto-crops to 1080x1920 (9:16) for mobile
- Styled captions: Burned-in SRT subtitles with customizable fonts and positioning
- Optional TTS: AI voice-over with Edge TTS (free), OpenAI TTS, or ElevenLabs
- Auto-publish: Send finished clips to Telegram channels or WhatsApp contacts
Activation
Requirements
Install FFmpeg
macOS:Windows:Linux (Debian/Ubuntu):Or download from ffmpeg.org/download.html.Estimated time: 2-5 minutesNote:
ffprobe ships bundled with FFmpeg.(Optional) Install Local Whisper
Only needed if you want local transcription (no API keys):Note: Requires GPU for fast transcription. CPU-only is very slow.
Configuration Settings
How audio is transcribed to text for captions and clip selection:
- auto: Auto-detect (tries YouTube subs first, then Groq/OpenAI/local Whisper)
- whisper_local: Local Whisper (requires GPU for speed)
- groq_whisper: Groq Whisper API (fast, free tier) - requires
GROQ_API_KEY - openai_whisper: OpenAI Whisper API - requires
OPENAI_API_KEY - deepgram: Deepgram Nova-2 - requires
DEEPGRAM_API_KEY
Optional voice-over or narration generation for clips:
- none: Disabled (captions only) - default
- edge_tts: Edge TTS (free, no API key)
- openai_tts: OpenAI TTS - requires
OPENAI_API_KEY - elevenlabs: ElevenLabs - requires
ELEVENLABS_API_KEY
API key from elevenlabs.io for high-quality text-to-speech. Required when ElevenLabs TTS is selected.
Where to send finished clips after processing:
- local_only: Local files only (no publishing) - default
- telegram: Telegram channel
- whatsapp: WhatsApp contact/group
- both: Telegram + WhatsApp
Bot token from @BotFather on Telegram (e.g.,
123456:ABC-DEF...). Bot must be admin in the target channel.Channel:
-100XXXXXXXXXX or @channelname. Group: numeric ID. Get it via @userinfobot.Permanent access token from Meta Business Settings > System Users. Temporary tokens expire in 24h.
Phone Number ID from Meta Developer Portal > WhatsApp > API Setup (e.g.,
1234567890).Phone number in international format, no + or spaces (e.g.,
14155551234).Required Tools
Clip Hand requires access to these tools (all built-in):shell_exec— Platform detection and FFmpeg/yt-dlp commandsfile_read,file_write,file_list— Transcript and clip filesweb_fetch— Metadata extractionmemory_store,memory_recall— State persistence
System Prompt Overview
Clip Hand operates in 8 phases:Platform Detection
Detects OS (Windows/macOS/Linux) to adapt command syntax. Verifies FFmpeg, ffprobe, and yt-dlp are installed.
Intake
Detects input type (URL or local file). For URLs, extracts metadata with
yt-dlp --dump-json. For files, analyzes with ffprobe. Warns if video >2 hours.Download
For URLs: downloads video with
yt-dlp (up to 1080p). Attempts to grab existing YouTube auto-subs (saves transcription time). For local files: verifies playability.Transcribe
Tries 5 paths in order: (A) YouTube auto-subs if available, (B) Groq Whisper API, (C) OpenAI Whisper API, (D) Deepgram Nova-2, (E) Local Whisper, (F) Scene/silence detection fallback. Produces word-level timing.
Analyze & Pick Segments
This is the core value. Reads full transcript, identifies 3-5 segments worth clipping based on: hook in first 3 seconds, self-contained story, emotional peaks, controversial takes, insight density, clean ending. Each 30-90 seconds.
Extract & Process
For each segment: (1) Extract clip with FFmpeg, (2) Crop to vertical 9:16, (3) Generate SRT captions from transcript, (4) Burn captions onto video with styled text, (5) Optionally add TTS voice-over, (6) Generate thumbnail.
Publish (Optional)
If publishing is configured: uploads clips to Telegram (max 50MB) and/or WhatsApp (max 16MB). Re-encodes if needed. Respects rate limits.
Usage Examples
Basic Clipping (YouTube URL)
- Download the video
- Grab YouTube auto-subs (if available) or transcribe with Groq Whisper
- Analyze transcript and pick 3-5 viral segments
- Extract clips, crop to vertical, add captions, generate thumbnails
- Save to
clip_1_final.mp4,clip_2_final.mp4, etc.
With Voice-Over
Local File
Custom Clip Count
Custom Timestamps
Publish to Telegram
Viral Segment Selection
Clip Hand identifies viral segments using these criteria:Hook in the First 3 Seconds
Hook in the First 3 Seconds
A surprising claim, question, or emotional statement that makes people watch.Good hooks:
- “I almost quit 3 years ago. Then I discovered…”
- “90% of startups fail because of this one mistake”
- “This changed everything:”
- “Hey guys, welcome back to my channel”
- “So, um, today I want to talk about…”
Self-Contained Story
Self-Contained Story
Makes sense without the full video context. Doesn’t require “you had to be there” knowledge.
Emotional Peaks
Emotional Peaks
Moments of laughter, surprise, anger, vulnerability, or triumph. Emotion drives shares.
Controversial or Contrarian Takes
Controversial or Contrarian Takes
Things people want to share or argue about. “Unpopular opinion: …” format.
Insight Density
Insight Density
High ratio of interesting ideas per second. No filler, no rambling.
Clean Ending
Clean Ending
Ends on a punchline, conclusion, or dramatic pause. Doesn’t trail off mid-sentence.
Dashboard Metrics
Clip Hand tracks five key metrics:Jobs Completed
Total video processing jobs finished.
Clips Generated
Total short clips produced.
Total Duration
Cumulative duration of all clips (in seconds).
Published to Telegram
Clips successfully sent to Telegram.
Published to WhatsApp
Clips successfully sent to WhatsApp.
http://localhost:4200/hands/clip.
STT Provider Comparison
| Provider | Speed | Cost | Quality | API Key Required |
|---|---|---|---|---|
| YouTube auto-subs | Instant | Free | Good | No |
| Groq Whisper | Very fast | Free tier | Excellent | Yes (GROQ_API_KEY) |
| OpenAI Whisper | Fast | $0.006/min | Excellent | Yes (OPENAI_API_KEY) |
| Deepgram Nova-2 | Fastest | Paid | Excellent | Yes (DEEPGRAM_API_KEY) |
| Local Whisper | Slow (CPU) / Fast (GPU) | Free | Excellent | No |
auto (default). It tries YouTube subs first (instant), then Groq (fast + free), then falls back to others.
TTS Provider Comparison
| Provider | Quality | Cost | API Key Required |
|---|---|---|---|
| Edge TTS | Good | Free | No |
| OpenAI TTS | Excellent | $15/1M chars | Yes (OPENAI_API_KEY) |
| ElevenLabs | Outstanding | Paid | Yes (ELEVENLABS_API_KEY) |
Output Files
For each clip, Clip Hand produces:clip_N_final.mp4: The finished clip (1080x1920, captions burned in, optional TTS)clip_N.srt: SRT subtitle file (word-level timing)thumb_N.jpg: Thumbnail (frame at 2 seconds)
Publishing
Telegram
Requires:- Bot Token: Create a bot via @BotFather
- Chat ID: Your channel ID (e.g.,
-100123456789or@channelname) - Bot must be admin in the channel
- Access Token: Permanent token from Meta Business Settings > System Users
- Phone Number ID: From Meta Developer Portal > WhatsApp > API Setup
- Recipient: Phone number in international format (e.g.,
14155551234)
Best Practices
Advanced Configuration
Custom Caption Styling
Edit~/.openfang/hands/clip.toml to customize caption appearance:
Batch Processing
Process multiple videos in one command:Integration with Content Calendar
Schedule daily clipping:Example Output
Troubleshooting
yt-dlp fails to download
yt-dlp fails to download
Error: “Unable to extract video data”Fix: Update yt-dlp:Or:
Whisper transcription is very slow
Whisper transcription is very slow
Issue: Local Whisper on CPU is 10-50x slower than real-time.Fix: Use Groq Whisper API (fast + free) instead:
Captions are cut off on the sides
Captions are cut off on the sides
Issue: Text too long for 1080px width.Fix: Reduce
font_size or enable word wrapping in SRT generation.Telegram upload fails with 'file too large'
Telegram upload fails with 'file too large'
Fix: Clip Hand will automatically re-encode to <50MB. If it still fails, manually lower CRF:
Next Steps
Twitter Hand
Share your clips on Twitter/X
Researcher Hand
Research trending topics to clip
