Clip Hand - OpenFang

What It Does

Clip Hand is an AI-powered shorts factory that takes any video URL or file and transforms it into 3-5 viral short clips (30-90 seconds each) with burned-in captions, vertical formatting (9:16 for TikTok/Reels/Shorts), thumbnails, and optional AI voice-over. This is an 8-phase pipeline: Download → Transcribe → Analyze Content → Pick Viral Segments → Extract & Crop → Add Captions → Generate Thumbnails → Optionally Publish to Telegram/WhatsApp.

Key Features

Content-based clipping: Reads transcript to pick segments based on hooks, emotional peaks, and insight density—not just visual scene changes
5 STT backends: YouTube auto-subs, Groq Whisper (fast/free), OpenAI Whisper, Deepgram Nova-2, local Whisper
Vertical formatting: Auto-crops to 1080x1920 (9:16) for mobile
Styled captions: Burned-in SRT subtitles with customizable fonts and positioning
Optional TTS: AI voice-over with Edge TTS (free), OpenAI TTS, or ElevenLabs
Auto-publish: Send finished clips to Telegram channels or WhatsApp contacts

Activation

Clip Hand requires FFmpeg, ffprobe, and yt-dlp. See the Requirements section below.

# Activate Clip Hand
openfang hand activate clip

# Activate with specific STT provider
openfang hand activate clip --settings "stt_provider=groq_whisper"

# Activate with TTS voice-over
openfang hand activate clip --settings "tts_provider=edge_tts"

Requirements

Install FFmpeg

macOS:

brew install ffmpeg

Windows:

winget install Gyan.FFmpeg

Linux (Debian/Ubuntu):

sudo apt install ffmpeg

Or download from ffmpeg.org/download.html.Estimated time: 2-5 minutesNote: ffprobe ships bundled with FFmpeg.

Install yt-dlp

macOS:

brew install yt-dlp

Windows:

winget install yt-dlp.yt-dlp

Linux (Debian/Ubuntu):

sudo apt install yt-dlp

Or via pip:

pip install yt-dlp

Estimated time: 1-2 minutes

(Optional) Install Local Whisper

Only needed if you want local transcription (no API keys):

pip install openai-whisper

Note: Requires GPU for fast transcription. CPU-only is very slow.

Verify Installation

ffmpeg -version
ffprobe -version
yt-dlp --version

All should return version numbers.

Configuration Settings

stt_provider

select

default:"auto"

How audio is transcribed to text for captions and clip selection:

auto: Auto-detect (tries YouTube subs first, then Groq/OpenAI/local Whisper)
whisper_local: Local Whisper (requires GPU for speed)
groq_whisper: Groq Whisper API (fast, free tier) - requires GROQ_API_KEY
openai_whisper: OpenAI Whisper API - requires OPENAI_API_KEY
deepgram: Deepgram Nova-2 - requires DEEPGRAM_API_KEY

tts_provider

select

default:"none"

Optional voice-over or narration generation for clips:

none: Disabled (captions only) - default
edge_tts: Edge TTS (free, no API key)
openai_tts: OpenAI TTS - requires OPENAI_API_KEY
elevenlabs: ElevenLabs - requires ELEVENLABS_API_KEY

elevenlabs_api_key

text

API key from elevenlabs.io for high-quality text-to-speech. Required when ElevenLabs TTS is selected.

publish_target

select

default:"local_only"

Where to send finished clips after processing:

local_only: Local files only (no publishing) - default
telegram: Telegram channel
whatsapp: WhatsApp contact/group
both: Telegram + WhatsApp

telegram_bot_token

text

Bot token from @BotFather on Telegram (e.g., 123456:ABC-DEF...). Bot must be admin in the target channel.

telegram_chat_id

text

Channel: -100XXXXXXXXXX or @channelname. Group: numeric ID. Get it via @userinfobot.

whatsapp_token

text

Permanent access token from Meta Business Settings > System Users. Temporary tokens expire in 24h.

whatsapp_phone_id

text

Phone Number ID from Meta Developer Portal > WhatsApp > API Setup (e.g., 1234567890).

whatsapp_recipient

text

Phone number in international format, no + or spaces (e.g., 14155551234).

Required Tools

Clip Hand requires access to these tools (all built-in):

shell_exec — Platform detection and FFmpeg/yt-dlp commands
file_read, file_write, file_list — Transcript and clip files
web_fetch — Metadata extraction
memory_store, memory_recall — State persistence

System Prompt Overview

Clip Hand operates in 8 phases:

Platform Detection

Detects OS (Windows/macOS/Linux) to adapt command syntax. Verifies FFmpeg, ffprobe, and yt-dlp are installed.

Intake

Detects input type (URL or local file). For URLs, extracts metadata with yt-dlp --dump-json. For files, analyzes with ffprobe. Warns if video >2 hours.

Download

For URLs: downloads video with yt-dlp (up to 1080p). Attempts to grab existing YouTube auto-subs (saves transcription time). For local files: verifies playability.

Transcribe

Tries 5 paths in order: (A) YouTube auto-subs if available, (B) Groq Whisper API, (C) OpenAI Whisper API, (D) Deepgram Nova-2, (E) Local Whisper, (F) Scene/silence detection fallback. Produces word-level timing.

Analyze & Pick Segments

This is the core value. Reads full transcript, identifies 3-5 segments worth clipping based on: hook in first 3 seconds, self-contained story, emotional peaks, controversial takes, insight density, clean ending. Each 30-90 seconds.

Extract & Process

For each segment: (1) Extract clip with FFmpeg, (2) Crop to vertical 9:16, (3) Generate SRT captions from transcript, (4) Burn captions onto video with styled text, (5) Optionally add TTS voice-over, (6) Generate thumbnail.

Publish (Optional)

If publishing is configured: uploads clips to Telegram (max 50MB) and/or WhatsApp (max 16MB). Re-encodes if needed. Respects rate limits.

Report

Generates summary table: clip #, title, file path, duration, file size, thumbnail path. Updates dashboard statistics.

Usage Examples

Basic Clipping (YouTube URL)

openfang chat clip
> "Turn this video into shorts: https://youtube.com/watch?v=dQw4w9WgXcQ"

Clip Hand will:

Download the video
Grab YouTube auto-subs (if available) or transcribe with Groq Whisper
Analyze transcript and pick 3-5 viral segments
Extract clips, crop to vertical, add captions, generate thumbnails
Save to clip_1_final.mp4, clip_2_final.mp4, etc.

With Voice-Over

openfang hand configure clip --set tts_provider="edge_tts"
openfang chat clip
> "Create shorts with AI voice-over from: https://youtube.com/watch?v=..."

Each clip will have the original audio (reduced to 30% volume) mixed with AI narration reading the captions.

Local File

openfang chat clip
> "Clip this file into shorts: /path/to/recording.mp4"

Works the same, but skips the download step.

Custom Clip Count

openfang chat clip
> "Create exactly 5 clips from this video: https://..."

Custom Timestamps

openfang chat clip
> "Extract clips at these timestamps: 1:23-2:15, 5:40-6:30, 10:00-11:00"

Skips the analysis phase, uses your exact timestamps.

Publish to Telegram

openfang hand configure clip \
  --set publish_target="telegram" \
  --set telegram_bot_token="123456:ABC-DEF..." \
  --set telegram_chat_id="@mychannel"

openfang chat clip
> "Clip this and send to Telegram: https://..."

After processing, clips are auto-uploaded to your Telegram channel.

Viral Segment Selection

Clip Hand identifies viral segments using these criteria:

Hook in the First 3 Seconds

A surprising claim, question, or emotional statement that makes people watch.Good hooks:

“I almost quit 3 years ago. Then I discovered…”
“90% of startups fail because of this one mistake”
“This changed everything:”

Bad hooks:

“Hey guys, welcome back to my channel”
“So, um, today I want to talk about…”

Self-Contained Story

Makes sense without the full video context. Doesn’t require “you had to be there” knowledge.

Emotional Peaks

Moments of laughter, surprise, anger, vulnerability, or triumph. Emotion drives shares.

Controversial or Contrarian Takes

Things people want to share or argue about. “Unpopular opinion: …” format.

Insight Density

High ratio of interesting ideas per second. No filler, no rambling.

Clean Ending

Ends on a punchline, conclusion, or dramatic pause. Doesn’t trail off mid-sentence.

Dashboard Metrics

Clip Hand tracks five key metrics:

Jobs Completed

Total video processing jobs finished.

Clips Generated

Total short clips produced.

Total Duration

Cumulative duration of all clips (in seconds).

Published to Telegram

Clips successfully sent to Telegram.

Published to WhatsApp

Clips successfully sent to WhatsApp.

View in the dashboard at http://localhost:4200/hands/clip.

STT Provider Comparison

Provider	Speed	Cost	Quality	API Key Required
YouTube auto-subs	Instant	Free	Good	No
Groq Whisper	Very fast	Free tier	Excellent	Yes (`GROQ_API_KEY`)
OpenAI Whisper	Fast	$0.006/min	Excellent	Yes (`OPENAI_API_KEY`)
Deepgram Nova-2	Fastest	Paid	Excellent	Yes (`DEEPGRAM_API_KEY`)
Local Whisper	Slow (CPU) / Fast (GPU)	Free	Excellent	No

Recommendation: Use auto (default). It tries YouTube subs first (instant), then Groq (fast + free), then falls back to others.

TTS Provider Comparison

Provider	Quality	Cost	API Key Required
Edge TTS	Good	Free	No
OpenAI TTS	Excellent	$15/1M chars	Yes (`OPENAI_API_KEY`)
ElevenLabs	Outstanding	Paid	Yes (`ELEVENLABS_API_KEY`)

Recommendation: Start with Edge TTS (free, no setup). Upgrade to ElevenLabs for premium voice quality.

Output Files

For each clip, Clip Hand produces:

clip_N_final.mp4: The finished clip (1080x1920, captions burned in, optional TTS)
clip_N.srt: SRT subtitle file (word-level timing)
thumb_N.jpg: Thumbnail (frame at 2 seconds)

All files saved in the same directory as the source video (or current directory for URLs).

Publishing

Requires:

Bot Token: Create a bot via @BotFather
Chat ID: Your channel ID (e.g., -100123456789 or @channelname)
Bot must be admin in the channel

File size limit: 50MB. Clips larger than 50MB are automatically re-encoded.

Requires:

Access Token: Permanent token from Meta Business Settings > System Users
Phone Number ID: From Meta Developer Portal > WhatsApp > API Setup
Recipient: Phone number in international format (e.g., 14155551234)

File size limit: 16MB. Clips larger than 16MB are automatically re-encoded. 24-hour window: WhatsApp requires the recipient to have messaged you within the last 24 hours (for non-template messages).

Best Practices

Clip Hand will never fabricate command output. All FFmpeg/yt-dlp operations are run with actual commands. If a command fails, it reports the real error.

For long videos (>1 hour), specify which segment to focus on: “Clip the first 30 minutes” or “Focus on the Q&A section starting at 45:00”.

YouTube auto-subs (when available) are instant and free. Enable stt_provider=auto to try them first.

If you’re clipping the same creator repeatedly, save their YouTube channel URL and let Clip Hand pull the latest video:

openfang chat clip
> "Clip the latest video from @MrBeast"

Advanced Configuration

Custom Caption Styling

Edit ~/.openfang/hands/clip.toml to customize caption appearance:

[agent.captions]
font_size = 24
font_name = "Arial"
primary_color = "&H00FFFFFF"  # White
outline_color = "&H00000000"  # Black
outline_thickness = 2
alignment = 2  # Bottom center
margin_v = 40  # Pixels from bottom

Batch Processing

Process multiple videos in one command:

openfang chat clip
> "Clip these videos: https://youtube.com/watch?v=1, https://youtube.com/watch?v=2, https://youtube.com/watch?v=3"

Clip Hand will process them sequentially.

Integration with Content Calendar

Schedule daily clipping:

openfang hand configure clip --set publish_target="telegram"

# Schedule daily at 6 AM
echo "Clip the latest video from @channel and publish" | \
  openfang schedule create --hand clip --cron "0 6 * * *"

Example Output

# Clip Job: "How I Built a $1M SaaS in 6 Months"
**Source**: https://youtube.com/watch?v=dQw4w9WgXcQ
**Duration**: 45:23 | **STT**: YouTube auto-subs | **TTS**: None
**Clips Generated**: 4

| # | Title | File | Duration | Size |
|---|-------|------|----------|------|
| 1 | "The $1M Idea" | clip_1_final.mp4 | 42s | 8.2MB |
| 2 | "Biggest Mistake" | clip_2_final.mp4 | 51s | 9.8MB |
| 3 | "First $100K" | clip_3_final.mp4 | 38s | 7.1MB |
| 4 | "Advice for Founders" | clip_4_final.mp4 | 46s | 8.9MB |

## Publishing
- Telegram: 4/4 sent successfully
- WhatsApp: Not configured

All clips saved to: /Users/you/clips/

Troubleshooting

yt-dlp fails to download

Error: “Unable to extract video data”Fix: Update yt-dlp:

pip install --upgrade yt-dlp

Or:

brew upgrade yt-dlp

Whisper transcription is very slow

Issue: Local Whisper on CPU is 10-50x slower than real-time.Fix: Use Groq Whisper API (fast + free) instead:

export GROQ_API_KEY="your_key_here"
openfang hand configure clip --set stt_provider="groq_whisper"

Captions are cut off on the sides

Issue: Text too long for 1080px width.Fix: Reduce font_size or enable word wrapping in SRT generation.

Telegram upload fails with 'file too large'

Fix: Clip Hand will automatically re-encode to <50MB. If it still fails, manually lower CRF:

ffmpeg -i clip_N_final.mp4 -fs 49M -c:v libx264 -crf 30 -preset fast -c:a aac -y clip_N_tg.mp4

Get Started

Core Concepts

Autonomous Hands

Configuration

Guides

Security

Deployment

​What It Does

​Key Features

​Activation

​Requirements

​Configuration Settings

​Required Tools

​System Prompt Overview

​Usage Examples

​Basic Clipping (YouTube URL)

​With Voice-Over

​Local File

​Custom Clip Count

​Custom Timestamps

​Publish to Telegram

​Viral Segment Selection

​Dashboard Metrics

Jobs Completed

Clips Generated

Total Duration

Published to Telegram

Published to WhatsApp

​STT Provider Comparison

​TTS Provider Comparison

​Output Files

​Publishing

​Telegram

​WhatsApp

​Best Practices

​Advanced Configuration

​Custom Caption Styling

​Batch Processing

​Integration with Content Calendar

​Example Output

​Troubleshooting

​Next Steps

Twitter Hand

Researcher Hand

Build docs developers (and LLMs) love

What It Does

Key Features

Activation

Requirements

Configuration Settings

Required Tools

System Prompt Overview

Usage Examples

Basic Clipping (YouTube URL)

With Voice-Over

Local File

Custom Clip Count

Custom Timestamps

Publish to Telegram

Viral Segment Selection

Dashboard Metrics

STT Provider Comparison

TTS Provider Comparison

Output Files

Publishing

Telegram

WhatsApp

Best Practices

Advanced Configuration

Custom Caption Styling

Batch Processing

Integration with Content Calendar

Example Output

Troubleshooting

Next Steps