Content generation pipeline - YBH Pulse Content

Content Pipeline Architecture

The content generation pipeline transforms raw transcripts into production-ready assets through a series of AI-powered stages, each building on the outputs of previous stages.

Stage 1: PRF Generation

PRF (Podcast Repurposing Framework) is the foundation of all downstream content. It’s a structured analysis of the episode that identifies key themes, insights, and quotable moments.

Inputs

Episode transcript - Full conversation text
Episode metadata - Number, guest name, title
Brand guidelines - YBH voice and positioning
Agent configuration - Model selection and system prompt

AI Orchestration

PRF generation uses an agentic workflow with:

RAG (Retrieval Augmented Generation)
Status Updates
Model Selection

Before generating, the AI retrieves relevant context:

Previous PRF examples (for style consistency)
YBH brand voice guidelines
IT leadership content patterns
Episode-specific terminology

This ensures:

Consistent formatting across episodes
Adherence to brand voice
Industry-appropriate language
Contextual understanding

Real-time progress tracking via Server-Sent Events (SSE):

onProgress: (step, detail, progress) => {
  // step: 'analyzing' | 'extracting' | 'formatting'
  // detail: Human-readable description
  // progress: 0-100
}

UI displays:

Current step (“Analyzing conversation…”)
Progress bar
Estimated time remaining

Output Structure

PRF typically includes:

## Executive Summary

[Guest] shares insights on [main topic], drawing from 
[years] of experience in [industry]. Key discussion points 
include [theme 1], [theme 2], and [theme 3].

PRF is stored as HTML in Sanity, allowing rich formatting. The TipTap editor preserves headings, lists, bold, italics, and other styling.

Approval & Editing

Before approving PRF:

Review for accuracy - Verify quotes and facts against transcript
Check brand voice - Ensure “anti-spin” positioning
Edit for clarity - Simplify jargon, add context
Format for readability - Use headings, lists, bold

Once approved:

prfApproved: true flag set
Timestamp recorded (prfApprovedAt)
Hooks and social posts generation enabled

Stage 2: Hooks Generation

Viral Hooks are short, attention-grabbing statements designed for social media engagement. They extract the most quotable, shareable moments from the episode.

Inputs

PRF document - Themes and quotes
Episode transcript - For fact verification
Episode metadata - Guest name, episode number
Previous hooks - Avoid repetition across episodes

Generation Strategy

Hooks are generated with specific engagement patterns:

Contrarian Statement

Challenge conventional wisdom:“Most CIOs think uptime is success. The best ones know it’s just the baseline.”

Unexpected Insight

Reveal surprising truth:“After 380 interviews, the pattern is clear: IT leaders who demand respect before crisis get better results.”

Direct Quote

Quotable soundbite:“The challenge isn’t finding a vendor. It’s finding the one who sucks the least.”

Specific Number

Data-driven hook:“73% of IT leaders say vendor relationships are transactional. Here’s why that’s a problem.”

Fact Verification

Every hook is verified against the transcript:

// AI receives transcript with verification instructions
const input = `
SOURCE TRANSCRIPT (for fact verification):
${transcript}

═══════════════════════════════════════════════
PRF ANALYSIS:
${prf}

IMPORTANT: Verify all quotes, statistics, and claims 
against the transcript before including in hooks.
`

Why this matters: AI can “hallucinate” compelling statements that sound plausible but were never said. Transcript verification prevents this.

Output Format

Hooks are stored as HTML with formatting:

<p><strong>Hook 1:</strong> After interviewing 380 IT professionals, one pattern stands out: respect shouldn't only show up when the system goes down.</p>

<p><strong>Hook 2:</strong> "The challenge? Finding the vendor who sucks the least." - Mark Baker on procurement reality.</p>

<p><strong>Hook 3:</strong> Most teams optimize for uptime. Elite teams optimize for <em>why</em> things go down.</p>

Platform-specific posts tailored for LinkedIn and Instagram, each with unique formatting, tone, and CTAs.

LinkedIn Posts

Generate two posts per episode:

Release Day Post
Follow-Up Post

Published when episode goes live.Structure:

[Hook opening]

[2-3 paragraphs expanding on theme]

[Specific insight or quote]

[Call-to-action: "Listen now"]

#Leadership #ITStrategy #Podcast

Characteristics:

Announcement tone
Episode link in comments
1-3 hashtags
~500-800 characters

Published 3-5 days after release.Structure:

[Specific insight from episode]

[Expand with context and implications]

[Actionable takeaway]

[CTA: "Catch the full conversation"]

#CIOInsights #VendorManagement

Characteristics:

Deep-dive on single topic
No direct episode announcement
More hashtags (2-5)
~600-1000 characters

Verified Facts Bank

LinkedIn posts include a structured facts bank:

{
  "verifiedFactsBank": {
    "directQuotes": [
      "We don't sell. We unsell.",
      "Respect shouldn't only show up when systems fail."
    ],
    "specificNumbers": [
      "380 IT professionals interviewed",
      "73% report transactional vendor relationships"
    ],
    "events": [
      "2019 vendor consolidation initiative",
      "Q4 2023 procurement process redesign"
    ],
    "frameworks": [
      "Three-tier vendor evaluation model",
      "Continuous improvement feedback loop"
    ],
    "insights": [
      "Proactive respect yields better crisis response",
      "Vendor relationships as strategic partnerships"
    ]
  }
}

Facts bank is not visible in the UI but stored for AI reference. It ensures social posts only use verified content from the transcript.

Instagram Captions

Generate 2-3 captions with visual-first formatting:

Caption 1 (Story Style):
🎙️ NEW EPISODE ALERT

[Guest] breaks down why most IT leaders get 
vendor relationships wrong (and how to fix it)

💡 Key insight: [Quote or statistic]

👆 Swipe for the full story

#YBHPodcast #ITLeadership #VendorManagement

---

Caption 2 (Insight Focus):
"[Compelling quote from episode]"

[2-3 sentences expanding on quote]

🔗 Link in bio to listen

#CIO #TechLeadership #PodcastRecommendation

Characteristics:

Emojis for visual breaks
Shorter paragraphs (mobile reading)
3-5 hashtags per caption
Clear CTA
150-300 characters

Stage 4: Visual Suggestions

Visual asset generation creates infographics, quote cards, and data visualizations tailored to episode content.

Parallel Generation

Instead of sequential generation, visual suggestions are created in parallel:

// Three AI calls run simultaneously
Promise.all([
  generateDataViz(prf, hooks, transcript),     // 4 suggestions
  generateCinematic(prf, hooks, transcript),   // 4 suggestions
  generateQuoteCards(hooks, transcript)        // 2 suggestions
])

Benefits:

Faster overall generation (30-60 seconds vs 2-3 minutes)
Independent failure handling (one failure doesn’t block others)
Progress tracking per stream

Suggestion Types

Data Visualizations
Cinematic Infographics
Quote Cards

Specs for charts, frameworks, and statisticsExample layouts:

Doom Loop (cyclical problem)
Quadrant Matrix (2×2 comparison)
Pyramid (hierarchy)
Pipeline/Funnel (process flow)
Card Grid (modular concepts)

Generated spec:

{
  "layout": "Doom Loop",
  "template": "Problem Cycle",
  "title": "The Vendor Selection Doom Loop",
  "colorSystem": "Problem/Solution (red→green)",
  "iconStyle": "isometric",
  "aspectRatio": "16:9",
  "contentBreakdown": {
    "mainMessage": "Why IT teams keep picking the wrong vendors",
    "sections": [
      "Urgent need drives hasty selection",
      "Hasty selection misses red flags",
      "Red flags surface post-contract",
      "Post-contract issues create urgent need"
    ],
    "dataPoints": [
      "68% of IT leaders regret vendor choice within 6 months",
      "Average contract lock-in: 3 years"
    ]
  },
  "prompt": "[Full Nano Banana Pro prompt]"
}

Editorial designs with bold visualsExample templates:

Split Screen (Before/After)
Hero Quote (large typography)
Center-Converge (multiple inputs → outcome)
Exploded View (breakdown)

Focus on:

Bold typography
High contrast colors
Minimal text
Emotional impact

Generated spec:

{
  "layout": "Split Screen",
  "template": "Before/After",
  "title": "Vendor Promises vs. Reality",
  "colorSystem": "High Contrast (cyan/magenta/yellow)",
  "iconStyle": "flat2d",
  "aspectRatio": "1:1",
  "prompt": "[Cinematic prompt with dramatic lighting]"
}

Guest quotes with YBH brandingSimpler specs:

Quote text (max 2 sentences)
Attribution (guest name + title)
Background style (dark cinematic)
Typography emphasis

Generated spec:

{
  "layout": "hero_quote",
  "template": "Quote Card",
  "title": "Quote from Mark Baker",
  "colorSystem": "dark_cinematic",
  "iconStyle": "flat2d",
  "aspectRatio": "1:1",
  "pullQuote": {
    "text": "The challenge isn't finding a vendor. It's finding the one who sucks the least.",
    "attribution": "Mark Baker, CIO"
  },
  "prompt": "[Quote card prompt]"
}

Variety Tracking

AI avoids repetition by checking generation history:

const history = episode.generatedAssets
  .map(a => a.spec.layout)
  .filter(Boolean)

// Recent layouts across all episodes
const recentLayouts = [
  "Doom Loop", "Doom Loop", "Quadrant Matrix", 
  "Pyramid", "Split Screen"
]

// AI prompt includes:
"Avoid recently used layouts: Doom Loop (used 2x recently), 
Quadrant Matrix, Pyramid. Prefer: Pipeline, Card Grid, 
Hub & Spoke."

Stage 5: Image Generation

Once specs are created, images are rendered using Kie.ai Nano Banana Pro.

Generation Flow

Submit task to Kie.ai

const { taskId } = await createTask({
  prompt: spec.prompt,
  aspectRatio: '16:9',
  resolution: '2K',
  outputFormat: 'png'
})

Poll for completion

const result = await waitForTask(taskId, {
  maxAttempts: 90,  // 4.5 minutes max
  intervalMs: 3000,  // Check every 3 seconds
  onProgress: (attempt, max, state) => {
    // Update UI progress bar
  }
})

Store image URL

await updateSuggestion({
  imageUrl: result.imageUrl,
  imageGeneratedAt: new Date().toISOString(),
  status: 'imageReady'
})

Resolution & Aspect Ratio

16:9 Landscape

Best for:

LinkedIn posts
Blog headers
YouTube thumbnails

Resolution:

1K: 1920×1080
2K: 2560×1440
4K: 3840×2160

1:1 Square

Best for:

Instagram posts
Quote cards
Profile images

Resolution:

1K: 1080×1080
2K: 2048×2048
4K: 4096×4096

9:16 Portrait

Best for:

Instagram Stories
TikTok
Reels

Resolution:

1K: 1080×1920
2K: 1440×2560
4K: 2160×3840

Stage 6: Video Clips

Short-form video suggestions identify the most engaging moments for TikTok, Reels, and YouTube Shorts.

Clip Structure

Each suggestion includes:

interface VideoSuggestion {
  thumbnailHook: string          // Text overlay (3-5 words)
  hookSentence: string           // First 3 seconds of video
  transcript: string             // Full clip transcript
  approximateTimestamps: {       // Estimated location in episode
    start: string                // "12:34"
    end: string                  // "13:45"
  }
  duration: string               // "60-90 seconds"
  whyItWorks: string             // Editorial rationale
  sourceSection: string          // Context from PRF
}

Example Output

{
  "thumbnailHook": "Vendor selection is broken",
  "hookSentence": "Most IT leaders get vendor selection completely wrong.",
  "transcript": "Most IT leaders get vendor selection completely wrong. They focus on features and pricing, but that's not what matters. What matters is: can this vendor make you successful? Not can they deliver the product, but will they invest in your success? That's the difference between a transaction and a partnership.",
  "approximateTimestamps": { "start": "23:15", "end": "24:30" },
  "duration": "75 seconds",
  "whyItWorks": "Contrarian opening grabs attention, then provides actionable reframe",
  "sourceSection": "Vendor Relationships (PRF Section 3)"
}

Video clips are suggestions only. Video editors use these as starting points, adjusting timestamps and length based on actual footage.

Fact-Checking System

All generated content is validated against the transcript to prevent hallucinations.

Fact-Check Agent

Runs automatically during visual suggestions generation:

const factCheck = await factCheckContent(
  transcript,
  prf,
  guestName,
  items,  // Extracted from all suggestions
  'Doug'  // Co-host name for attribution
)

Validation Process

Extract fact-checkable items

From all suggestions:

Statistics and numbers
Direct quotes
Claims and assertions
Lists and frameworks

Search transcript for evidence

AI searches for supporting evidence:

Exact quote matches
Paraphrased statements
Statistical sources
Attribution verification

Classify each item

type Status = 
  | 'verified'      // Found in transcript
  | 'unverified'   // Not found
  | 'misattributed' // Wrong speaker
  | 'fabricated'   // Clearly invented

Generate report

{
  "overallScore": 85,
  "passedValidation": true,
  "criticalIssues": [
    "Statistic '73% of vendors' not found in transcript"
  ],
  "results": [...]
}

Fact-check threshold: Overall score must be ≥80% to pass validation. Critical issues trigger warnings in the UI.

Pipeline Performance

Generation Times

Stage	Average Time	Range
PRF	30-45s	20-60s
Hooks	20-30s	15-40s
LinkedIn Posts	25-35s	20-50s
Instagram Captions	15-25s	10-30s
Visual Suggestions (10)	45-60s	30-90s
Image Rendering (per image)	60-120s	30-180s
Video Clips	20-30s	15-40s
Fact-Check	15-25s	10-40s

Times vary based on transcript length, model selection, and API response times. Claude models are generally faster than GPT-4.

Cost Optimization

Model selection: Claude 3.5 Sonnet offers best balance of quality and cost
Parallel generation: Reduces wall-clock time without increasing token usage
Prompt caching: Reuses transcript analysis across multiple stages (future feature)
Selective regeneration: Only regenerate specific content, not entire pipeline

Get Started

Core Concepts

​Content Pipeline Architecture

​Stage 1: PRF Generation

​Inputs

​AI Orchestration

​Output Structure

​Approval & Editing

​Stage 2: Hooks Generation

​Inputs

​Generation Strategy

Contrarian Statement

Unexpected Insight

Direct Quote

Specific Number

​Fact Verification

​Output Format

​Stage 3: Social Posts Generation

​LinkedIn Posts

​Verified Facts Bank

​Instagram Captions

​Stage 4: Visual Suggestions

​Parallel Generation

​Suggestion Types

​Variety Tracking

​Stage 5: Image Generation

​Generation Flow

​Resolution & Aspect Ratio

16:9 Landscape

1:1 Square

9:16 Portrait

​Stage 6: Video Clips

​Clip Structure

​Example Output

​Fact-Checking System

​Fact-Check Agent

​Validation Process

​Pipeline Performance

​Generation Times

​Cost Optimization

Build docs developers (and LLMs) love

Content Pipeline Architecture

Stage 1: PRF Generation

Inputs

AI Orchestration

Output Structure

Approval & Editing

Stage 2: Hooks Generation

Inputs

Generation Strategy

Fact Verification

Output Format

Stage 3: Social Posts Generation

LinkedIn Posts

Verified Facts Bank

Instagram Captions

Stage 4: Visual Suggestions

Parallel Generation

Suggestion Types

Variety Tracking

Stage 5: Image Generation

Generation Flow

Resolution & Aspect Ratio

Stage 6: Video Clips

Clip Structure

Example Output

Fact-Checking System

Fact-Check Agent

Validation Process

Pipeline Performance

Generation Times

Cost Optimization