Podcast Agent - Hyperbolic AgentKit

Overview

The Podcast Agent provides a comprehensive suite of tools for processing podcast videos, including automatic transcription with speaker identification, AI-assisted video editing, and semantic knowledge base creation from podcast content.

Features

Video Transcription: Convert podcast videos to structured transcripts with speaker identification
Speaker Recognition: Automatically identify speakers based on visual characteristics
AI Video Editing: Intelligent editing suggestions based on content analysis
Knowledge Base: Create searchable embeddings from podcast transcripts
Multi-format Support: Process MP4, MOV, AVI, MKV, and WebM formats

Components

The Podcast Agent consists of three main modules:

1. Video Transcription (geminivideo.py)

2. AI Video Editor (aiagenteditor.py)

3. Knowledge Base (podcast_knowledge_base.py)

Video Transcription

Automatic transcription with speaker identification using Google’s Gemini 1.5 Pro vision model.

Setup

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

Requires a Google Cloud project with Vertex AI API enabled and a service account with appropriate permissions.

Configuration

podcast_agent/geminivideo.py

PROJECT_ID = "your-project-id"
LOCATION = "us-central1"
MODEL_ID = "gemini-1.5-pro"

vertexai.init(project=PROJECT_ID, location=LOCATION)
model = GenerativeModel(MODEL_ID)

Basic Usage

from podcast_agent.geminivideo import process_video

# Process a single video
output_path = process_video("path/to/podcast.mp4")
print(f"Transcript saved to: {output_path}")

Speaker Identification

The transcription tool uses visual analysis to identify speakers:

podcast_agent/geminivideo.py

prompt = f"""
The name of this podcast is The Rollup. There are two hosts:
- Andy: Light blonde, curly hair, longer on top with wave, light complexion.
- Rob: Short dark hair, slightly receding, light to medium skin, short beard.

Any other speakers are guests.

Transcribe this interview, identify speakers, and return JSON format:
[
    {{
        "speaker": "Speaker Name",
        "content": "What they said"
    }}
]
"""

Customize the speaker descriptions in the prompt to match your podcast’s hosts and guests for accurate identification.

Output Format

Transcripts are saved as JSON files in the jsonoutputs/ directory:

[
  {
    "speaker": "Andy",
    "content": "Welcome to The Rollup! Today we're discussing..."
  },
  {
    "speaker": "Rob",
    "content": "Thanks for having me. I'm excited to talk about..."
  },
  {
    "speaker": "Guest",
    "content": "It's great to be here. Let me share some insights on..."
  }
]

Batch Processing

from podcast_agent.geminivideo import main
import os

# Process all videos in a directory
video_dir = "split_videos"
video_files = [
    os.path.join(video_dir, f)
    for f in os.listdir(video_dir)
    if f.lower().endswith(('.mov', '.mp4', '.avi', '.mkv', '.webm'))
]

for video_path in video_files:
    try:
        process_video(video_path)
    except Exception as e:
        print(f"Failed to process {video_path}: {str(e)}")

Retry Logic

Automatic retry with exponential backoff for API rate limits:

podcast_agent/geminivideo.py

@retry_with_exponential_backoff(max_retries=3, initial_delay=1)
def process_video(video_path):
    # Processing logic with automatic retry on failure
    video_bytes = pathlib.Path(video_path).read_bytes()
    mime_type = get_mime_type(video_path)
    video_file = Part.from_data(video_bytes, mime_type=mime_type)
    
    contents = [video_file, prompt]
    responses = model.generate_content(contents)
    return responses

AI Video Editor

Intelligent video editing with automated clip selection and assembly using Gemini’s video analysis.

Features

Content Analysis: AI analyzes video content and suggests edits
Timestamp Validation: Ensures all edits are within video duration
Parallel Processing: Concurrent clip trimming for faster processing
FFmpeg Integration: Professional-grade video editing

Basic Usage

from podcast_agent.aiagenteditor import process_video

# Process video with custom instructions
output_path = process_video(
    videopath="podcast_episode.mp4",
    custom_instructions="Focus on technical discussions, remove pauses longer than 2 seconds"
)

if output_path:
    print(f"Edited video saved to: {output_path}")

Edit Analysis Workflow

Video Analysis

Gemini analyzes the entire video and identifies segments to keep or remove based on content quality and pacing.

Timestamp Generation

AI generates precise timestamps for each suggested edit in HH:MM:SS format.

Validation

All timestamps are validated against video duration and checked for chronological order.

Clip Extraction

Valid segments are extracted using FFmpeg with parallel processing.

Concatenation

Approved clips are concatenated into the final edited video.

Response Schema

The AI returns structured editing suggestions:

podcast_agent/aiagenteditor.py

response_schema = {
    "type": "object",
    "properties": {
        "edits": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "start_time": {"type": "string", "pattern": "^[0-9]{2}:[0-9]{2}:[0-9]{2}$"},
                    "end_time": {"type": "string", "pattern": "^[0-9]{2}:[0-9]{2}:[0-9]{2}$"},
                    "keep": {"type": "boolean"},
                    "reason": {"type": "string"}
                },
                "required": ["start_time", "end_time", "keep", "reason"]
            }
        }
    }
}

Example Edit Output

{
  "edits": [
    {
      "start_time": "00:00:00",
      "end_time": "00:05:30",
      "keep": true,
      "reason": "Strong introduction with key points"
    },
    {
      "start_time": "00:05:30",
      "end_time": "00:06:15",
      "keep": false,
      "reason": "Long pause and technical difficulties"
    },
    {
      "start_time": "00:06:15",
      "end_time": "00:15:00",
      "keep": true,
      "reason": "Main discussion with valuable insights"
    }
  ]
}

Video Processing Functions

podcast_agent/aiagenteditor.py

def get_video_duration(video_path):
    """Get video duration using ffprobe."""
    command = [
        'ffprobe',
        '-v', 'error',
        '-select_streams', 'v:0',
        '-show_entries', 'format=duration',
        '-of', 'default=noprint_wrappers=1:nokey=1',
        video_path
    ]
    result = subprocess.run(command, capture_output=True, text=True)
    duration = float(result.stdout.strip())
    return duration

def trim_video(input_path, output_path, start_time, end_time):
    """Trim video using ffmpeg."""
    command = [
        'ffmpeg',
        '-i', input_path,
        '-ss', start_time,
        '-to', end_time,
        '-c', 'copy',
        '-y',
        output_path
    ]
    subprocess.run(command, check=True)

def concatenate_videos(clip_paths, output_path):
    """Concatenate multiple clips into final video."""
    with tempfile.NamedTemporaryFile(mode="w", suffix=".txt") as tmp_file:
        for clip_path in clip_paths:
            tmp_file.write(f"file '{clip_path}'\n")
        tmp_file.flush()
        
        command = [
            'ffmpeg',
            '-f', 'concat',
            '-safe', '0',
            '-i', tmp_file.name,
            '-c', 'copy',
            '-y',
            output_path
        ]
        subprocess.run(command, check=True)

Custom Editing Instructions

# Example 1: Technical content focus
process_video(
    "tech_podcast.mp4",
    custom_instructions="Keep all technical discussions, remove casual banter"
)

# Example 2: Pacing improvement
process_video(
    "interview.mp4",
    custom_instructions="Remove pauses longer than 3 seconds, keep all Q&A segments"
)

# Example 3: Highlight reel
process_video(
    "long_episode.mp4",
    custom_instructions="Extract only the most insightful moments and key takeaways"
)

Podcast Knowledge Base

Create a semantic search engine from podcast transcripts using ChromaDB and sentence transformers.

Setup

pip install chromadb sentence-transformers

Initialization

from podcast_agent.podcast_knowledge_base import PodcastKnowledgeBase

# Initialize with persistent storage
kb = PodcastKnowledgeBase(collection_name="podcast_knowledge")

# Process all JSON transcripts
kb.process_all_json_files(directory="jsonoutputs/")

Architecture

podcast_agent/podcast_knowledge_base.py

class PodcastKnowledgeBase:
    def __init__(self, collection_name: str = "podcast_knowledge"):
        # Persistent ChromaDB client
        self.client = chromadb.PersistentClient(path="./chroma_db")
        
        # Advanced embedding model
        self.embedding_model = SentenceTransformer('all-mpnet-base-v2')
        
        # Create collection with custom embedding function
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            embedding_function=embedding_func
        )

Adding Segments

from podcast_agent.podcast_knowledge_base import PodcastSegment

# Create segments manually
segments = [
    PodcastSegment(
        id="ep1_0",
        speaker="Andy",
        content="Today we're discussing the future of AI...",
        source_file="episode_1.json"
    ),
    PodcastSegment(
        id="ep1_1",
        speaker="Rob",
        content="Machine learning has evolved significantly...",
        source_file="episode_1.json"
    )
]

kb.add_segments(segments)

Semantic Search

# Query the knowledge base
results = kb.query_knowledge_base(
    query="What did they say about machine learning?",
    n_results=5
)

# Format and display results
formatted = kb.format_query_results(results)
print(formatted)

Query Results Format

[
    {
        "content": "Machine learning has evolved significantly...",
        "metadata": {
            "speaker": "Rob",
            "source_file": "episode_1.json",
            "timestamp": "2024-01-15T10:30:00"
        },
        "relevance_score": 0.89
    },
    {
        "content": "The applications of ML in healthcare are tremendous...",
        "metadata": {
            "speaker": "Guest",
            "source_file": "episode_3.json",
            "timestamp": "2024-01-20T14:15:00"
        },
        "relevance_score": 0.76
    }
]

Processing JSON Files

podcast_agent/podcast_knowledge_base.py

def process_json_file(self, file_path: str):
    """Process a podcast transcript JSON and add to knowledge base."""
    with open(file_path, 'r', encoding='utf-8') as f:
        transcript_data = json.load(f)
    
    segments = []
    for idx, entry in enumerate(transcript_data):
        segment = PodcastSegment(
            id=f"{os.path.basename(file_path)}_{idx}",
            speaker=entry['speaker'],
            content=entry['content'],
            source_file=file_path
        )
        segments.append(segment)
    
    self.add_segments(segments)

Knowledge Base Statistics

# Get collection stats
stats = kb.get_collection_stats()
print(f"Total segments: {stats['count']}")
print(f"Last updated: {stats['last_update']}")

# Check processed files
processed = kb.get_processed_files()
print(f"Processed files: {processed}")

Advanced Queries

# Complex semantic search
queries = [
    "What are the challenges in blockchain scalability?",
    "How does NFT technology work?",
    "What did guests say about DeFi adoption?"
]

for query in queries:
    print(f"\n=== Query: {query} ===")
    results = kb.query_knowledge_base(query, n_results=3)
    
    for i, result in enumerate(results, 1):
        print(f"\n{i}. [{result['metadata']['speaker']}] "
              f"(Score: {result['relevance_score']:.2f})")
        print(f"   {result['content'][:200]}...")

Collection Management

# Clear the entire collection
kb.clear_collection()

# Re-process all files
kb.process_all_json_files()

# Get processed file list
processed_files = kb.get_processed_files()
print(f"Knowledge base contains {len(processed_files)} transcripts")

Integration Example

Complete workflow from video to searchable knowledge base:

import asyncio
from podcast_agent.geminivideo import process_video as transcribe
from podcast_agent.aiagenteditor import process_video as edit
from podcast_agent.podcast_knowledge_base import PodcastKnowledgeBase

async def process_podcast_workflow(video_path: str):
    # Step 1: Transcribe the video
    print("Step 1: Transcribing video...")
    transcript_path = transcribe(video_path)
    
    # Step 2: Edit the video (optional)
    print("Step 2: Editing video...")
    edited_path = edit(
        video_path,
        custom_instructions="Keep main discussion, remove technical issues"
    )
    
    # Step 3: Add to knowledge base
    print("Step 3: Adding to knowledge base...")
    kb = PodcastKnowledgeBase()
    kb.process_json_file(transcript_path)
    
    # Step 4: Query the content
    print("Step 4: Testing semantic search...")
    results = kb.query_knowledge_base("main topics discussed", n_results=5)
    print(kb.format_query_results(results))
    
    return {
        "transcript": transcript_path,
        "edited_video": edited_path,
        "segments_added": len(results)
    }

# Run the workflow
result = asyncio.run(process_podcast_workflow("my_podcast.mp4"))
print(f"Workflow complete: {result}")

Configuration

PROJECT_ID

string

required

Google Cloud project ID for Vertex AI

LOCATION

string

default:"us-central1"

Google Cloud region for Vertex AI API

MODEL_ID

string

default:"gemini-1.5-pro"

Gemini model version for video analysis

MAX_RETRIES

int

default:"3"

Number of retry attempts for API calls

Best Practices

Speaker Customization

Customize speaker descriptions in the transcription prompt for your specific podcast hosts.

Batch Processing

Process multiple videos in batches with delays between API calls to avoid rate limits.

Edit Instructions

Provide clear, specific editing instructions for best AI-assisted editing results.

Knowledge Base Updates

Regularly update the knowledge base with new episodes for comprehensive search coverage.

The video editor uses FFmpeg. Ensure it’s installed: brew install ffmpeg (macOS) or apt-get install ffmpeg (Linux)

Hyperbolic Tools

Blockchain Tools

Social Media Tools

Content & Automation

​Overview

​Features

​Components

​1. Video Transcription (geminivideo.py)

​2. AI Video Editor (aiagenteditor.py)

​3. Knowledge Base (podcast_knowledge_base.py)

​Video Transcription

​Setup

​Configuration

​Basic Usage

​Speaker Identification

​Output Format

​Batch Processing

​Retry Logic

​AI Video Editor

​Features

​Basic Usage

​Edit Analysis Workflow

​Response Schema

​Example Edit Output

​Video Processing Functions

​Custom Editing Instructions

​Podcast Knowledge Base

​Setup

​Initialization

​Architecture

​Adding Segments

​Semantic Search

​Query Results Format

​Processing JSON Files

​Knowledge Base Statistics

​Advanced Queries

​Collection Management

​Integration Example

​Configuration

​Best Practices

Speaker Customization

Batch Processing

Edit Instructions

Knowledge Base Updates

Build docs developers (and LLMs) love

Overview

Features

Components

1. Video Transcription (geminivideo.py)

2. AI Video Editor (aiagenteditor.py)

3. Knowledge Base (podcast_knowledge_base.py)

Video Transcription

Setup

Configuration

Basic Usage

Speaker Identification

Output Format

Batch Processing

Retry Logic

AI Video Editor

Features

Basic Usage

Edit Analysis Workflow

Response Schema

Example Edit Output

Video Processing Functions

Custom Editing Instructions

Podcast Knowledge Base

Setup

Initialization

Architecture

Adding Segments

Semantic Search

Query Results Format

Processing JSON Files

Knowledge Base Statistics

Advanced Queries

Collection Management

Integration Example

Configuration

Best Practices