YouTube Data Source

Overview

The YouTube integration uses the official YouTube Data API v3 to search for relevant videos and collect their comment threads. Unlike X and Reddit which require web scraping, YouTube provides a stable, well-documented API.

YouTube data collection is implemented in supabase/functions/fetch-youtube/index.ts and serves as a fallback when X/Reddit scraping fails.

How It Works

The YouTube fetcher follows a two-step process:

Step 1: Search for Videos

const searchUrl = new URL("https://www.googleapis.com/youtube/v3/search");
searchUrl.searchParams.set("part", "snippet");
searchUrl.searchParams.set("q", topic.query);
searchUrl.searchParams.set("type", "video");
searchUrl.searchParams.set("maxResults", "5");
searchUrl.searchParams.set("order", "date");
searchUrl.searchParams.set("key", YOUTUBE_API_KEY);

const searchResponse = await fetch(searchUrl.toString());
const searchData = await searchResponse.json();

const videoIds = (searchData.items || [])
  .map((item: any) => item.id?.videoId)
  .filter(Boolean);

Order: date ensures the most recent videos appear first, providing real-time sentiment data.

Step 2: Fetch Comments for Each Video

let totalInserted = 0;
let totalFetched = 0;

for (const videoId of videoIds) {
  const commentsUrl = new URL(
    "https://www.googleapis.com/youtube/v3/commentThreads"
  );
  commentsUrl.searchParams.set("part", "snippet");
  commentsUrl.searchParams.set("videoId", videoId);
  commentsUrl.searchParams.set("maxResults", "20");
  commentsUrl.searchParams.set("order", "relevance");
  commentsUrl.searchParams.set("key", YOUTUBE_API_KEY);

  const commentsResponse = await fetch(commentsUrl.toString());
  if (!commentsResponse.ok) {
    console.error(`Comments fetch failed for video ${videoId}`);
    await commentsResponse.text(); // consume body
    continue;
  }

  const commentsData = await commentsResponse.json();
  const comments = commentsData.items || [];
  totalFetched += comments.length;

  for (const comment of comments) {
    const snippet = comment.snippet?.topLevelComment?.snippet;
    if (!snippet) continue;

    const { error } = await supabase.from("posts").upsert(
      {
        topic_id,
        platform: "youtube",
        external_id: comment.id,
        author: snippet.authorDisplayName || "Anonymous",
        content: snippet.textDisplay?.replace(/<[^>]*>/g, "") || "",
        posted_at: snippet.publishedAt,
      },
      { onConflict: "platform,external_id" }
    );
    if (!error) totalInserted++;
  }
}

API Endpoints

Search Endpoint

URL: https://www.googleapis.com/youtube/v3/search Parameters:

Parameter	Value	Description
`part`	`snippet`	Returns title, description, thumbnails
`q`	`{topic.query}`	Search query
`type`	`video`	Only return videos (not channels/playlists)
`maxResults`	`5`	Number of videos to retrieve
`order`	`date`	Sort by upload date (newest first)
`key`	`{API_KEY}`	YouTube Data API key

curl "https://www.googleapis.com/youtube/v3/search?part=snippet&q=OpenAI&type=video&maxResults=5&order=date&key=YOUR_API_KEY"

Comments Endpoint

URL: https://www.googleapis.com/youtube/v3/commentThreads Parameters:

Parameter	Value	Description
`part`	`snippet`	Returns comment metadata
`videoId`	`{videoId}`	Video to fetch comments from
`maxResults`	`20`	Comments per video
`order`	`relevance`	Sort by relevance (top comments)
`key`	`{API_KEY}`	YouTube Data API key

curl "https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=dQw4w9WgXcQ&maxResults=20&order=relevance&key=YOUR_API_KEY"

Data Extraction

Comment Structure

YouTube comments are nested inside the response:

const snippet = comment.snippet?.topLevelComment?.snippet;

const author = snippet.authorDisplayName || "Anonymous";
const content = snippet.textDisplay?.replace(/<[^>]*>/g, "") || "";
const posted_at = snippet.publishedAt;

textDisplay contains HTML tags (e.g., <br>, <a>). Always strip HTML before storing:

content: snippet.textDisplay?.replace(/<[^>]*>/g, "") || ""

Comment Ordering

The API supports two ordering strategies:

// Returns top comments by engagement (likes, replies)
commentsUrl.searchParams.set("order", "relevance");

Why use 'relevance' instead of 'time'?

Relevance-sorted comments are better for sentiment analysis because:

Higher engagement = more representative of viewer opinions
Top comments are more likely to be substantive (not spam)
Controversial opinions rise to the top (important for sentiment diversity)

Time-sorted comments may include spam, bots, and low-quality content.

Error Handling

API Error Detection

if (!searchResponse.ok) {
  const errText = await searchResponse.text().catch(() => "");
  console.error(
    `YouTube search API error [${searchResponse.status}]: ${errText.substring(0, 200)}`
  );
  return new Response(
    JSON.stringify({
      success: false,
      fetched: 0,
      inserted: 0,
      info: `YouTube API returned ${searchResponse.status}`,
    }),
    { headers: { ...corsHeaders, "Content-Type": "application/json" } }
  );
}

Common HTTP Status Codes

Status	Meaning	Solution
`200`	Success	Process data
`400`	Bad Request	Check query parameters
`403`	Forbidden	API key invalid or quota exceeded
`404`	Not Found	Video doesn’t exist or has comments disabled

403 Forbidden usually means quota exceeded. YouTube allows 10,000 quota units/day for free tier.Cost breakdown:

search: 100 units
commentThreads: 1 unit

Fetching 5 videos + 20 comments each = 100 + (5 × 1) = 105 units per query

Rate Limits & Quota

YouTube Data API v3 Quota

Free Tier: 10,000 units/day
Paid Tier: Request quota increase via Google Cloud Console

Quota Costs

Operation	Cost	Notes
`search`	100 units	Per request
`commentThreads`	1 unit	Per video
`videos`	1 unit	Not used in SENTi-radar

Daily Usage Calculation

// Current implementation:
// - 1 search (5 videos) = 100 units
// - 5 commentThreads = 5 units
// Total per topic = 105 units

// Max topics per day = 10,000 / 105 ≈ 95 topics

To analyze more topics, reduce maxResults for search or skip videos with comments disabled.

No Results Handling

if (videoIds.length === 0) {
  return new Response(
    JSON.stringify({
      success: true,
      fetched: 0,
      inserted: 0,
      info: "No YouTube videos found for this topic",
    }),
    { headers: { ...corsHeaders, "Content-Type": "application/json" } }
  );
}

Why return success: true when no videos found?

This prevents the orchestrator from treating it as a failure. No videos is a valid outcome (e.g., query is too new or niche), not an error.Errors should only be returned for:

API authentication failures
Network timeouts
Malformed requests

Database Persistence

for (const comment of comments) {
  const snippet = comment.snippet?.topLevelComment?.snippet;
  if (!snippet) continue;

  const { error } = await supabase.from("posts").upsert(
    {
      topic_id,
      platform: "youtube",
      external_id: comment.id,
      author: snippet.authorDisplayName || "Anonymous",
      content: snippet.textDisplay?.replace(/<[^>]*>/g, "") || "",
      posted_at: snippet.publishedAt,
    },
    { onConflict: "platform,external_id" }
  );
  if (!error) totalInserted++;
}

Response Format

{
  "success": true,
  "fetched": 87,
  "inserted": 85,
  "info": "Fetched comments from 5 videos"
}

Response Fields

success: true if API calls succeeded (even if no comments found)
fetched: Total comments retrieved from YouTube
inserted: Comments successfully saved (may be less than fetched due to duplicates)

Environment Setup

1. Create YouTube API Key

Go to Google Cloud Console

Visit console.cloud.google.com

Create or select project

Click Select a project → New Project

Enable YouTube Data API v3

APIs & Services → Enable APIs and Services → Search “YouTube Data API v3” → Enable

Create API Key

Credentials → Create Credentials → API Key

Restrict API Key (recommended)

Click Restrict Key → API restrictions → Select “YouTube Data API v3”

2. Configure Environment

YOUTUBE_API_KEY=AIzaSyBxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your_service_key

Testing

supabase functions serve fetch-youtube --env-file .env

curl -X POST http://localhost:54321/functions/v1/fetch-youtube \
  -H "Authorization: Bearer ${SUPABASE_SERVICE_ROLE_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"topic_id": "your-topic-uuid"}'

Common Issues

403 Forbidden: quotaExceeded

Cause: Daily quota limit (10,000 units) exceededSolutions:

Wait until quota resets (midnight Pacific Time)
Request quota increase in Google Cloud Console
Reduce maxResults in search and commentThreads
Implement caching to avoid redundant API calls

403 Forbidden: commentsDisabled

Cause: Video creator disabled commentsSolution: The code already handles this gracefully:

if (!commentsResponse.ok) {
  console.error(`Comments fetch failed for video ${videoId}`);
  await commentsResponse.text(); // consume body
  continue; // Skip to next video
}

API returns empty items array

Causes:

Query has no YouTube videos (check youtube.com/results manually)
All videos have comments disabled
Videos are age-restricted or private

Debug:

# Test search manually
curl "https://www.googleapis.com/youtube/v3/search?part=snippet&q=YOUR_QUERY&type=video&key=${YOUTUBE_API_KEY}" | jq '.items | length'

textDisplay contains HTML tags

Problem: Raw comment text includes <br>, <a> tagsSolution: Already implemented in code:

content: snippet.textDisplay?.replace(/<[^>]*>/g, "") || ""

Optimization Tips

Reduce Quota Usage

// Change maxResults to reduce quota consumption
searchUrl.searchParams.set("maxResults", "3");  // Was 5
commentsUrl.searchParams.set("maxResults", "10"); // Was 20

// New quota: 100 + (3 × 1) = 103 units per topic
// Max topics per day: 10,000 / 103 ≈ 97

Parallelize Comment Fetching

// Fetch all video comments in parallel instead of sequentially
const commentPromises = videoIds.map(videoId => 
  fetch(buildCommentsUrl(videoId))
);
const commentResults = await Promise.allSettled(commentPromises);

// Reduces total latency by ~70%

Cache Video IDs

// Store video IDs in database to avoid re-searching
await supabase.from("youtube_cache").insert({
  topic_id,
  video_ids: videoIds,
  cached_at: new Date().toISOString()
});

// Check cache before calling search API
const { data: cached } = await supabase
  .from("youtube_cache")
  .select("video_ids")
  .eq("topic_id", topicId)
  .gte("cached_at", oneHourAgo)
  .single();

Get Started

Setup & Configuration

Core Features

Data Sources

Guides

Overview

How It Works

Step 1: Search for Videos

Step 2: Fetch Comments for Each Video

API Endpoints

Search Endpoint

Comments Endpoint

Data Extraction

Comment Structure

Comment Ordering

Error Handling

API Error Detection

Common HTTP Status Codes

Rate Limits & Quota

YouTube Data API v3 Quota

Quota Costs

Daily Usage Calculation

No Results Handling

Database Persistence

Response Format

Response Fields

Environment Setup

1. Create YouTube API Key

2. Configure Environment

Testing

Common Issues

Optimization Tips

Next Steps

Data Sources Overview

Sentiment Analysis

Build docs developers (and LLMs) love

Get Started

Setup & Configuration

Core Features

Data Sources

Guides

​Overview

​How It Works

​Step 1: Search for Videos

​Step 2: Fetch Comments for Each Video

​API Endpoints

​Search Endpoint

​Comments Endpoint

​Data Extraction

​Comment Structure

​Comment Ordering

​Error Handling

​API Error Detection

​Common HTTP Status Codes

​Rate Limits & Quota

​YouTube Data API v3 Quota

​Quota Costs

​Daily Usage Calculation

​No Results Handling

​Database Persistence

​Response Format

​Response Fields

​Environment Setup

​1. Create YouTube API Key

​2. Configure Environment

​Testing

​Common Issues

​Optimization Tips

​Next Steps

Data Sources Overview

Sentiment Analysis

Build docs developers (and LLMs) love

Overview

How It Works

Step 1: Search for Videos

Step 2: Fetch Comments for Each Video

API Endpoints

Search Endpoint

Comments Endpoint

Data Extraction

Comment Structure

Comment Ordering

Error Handling

API Error Detection

Common HTTP Status Codes

Rate Limits & Quota

YouTube Data API v3 Quota

Quota Costs

Daily Usage Calculation

No Results Handling

Database Persistence

Response Format

Response Fields

Environment Setup

1. Create YouTube API Key

2. Configure Environment

Testing

Common Issues

Optimization Tips

Next Steps