Skip to main content

Overview

The YouTube integration uses the official YouTube Data API v3 to search for relevant videos and collect their comment threads. Unlike X and Reddit which require web scraping, YouTube provides a stable, well-documented API.
YouTube data collection is implemented in supabase/functions/fetch-youtube/index.ts and serves as a fallback when X/Reddit scraping fails.

How It Works

The YouTube fetcher follows a two-step process:

Step 1: Search for Videos

const searchUrl = new URL("https://www.googleapis.com/youtube/v3/search");
searchUrl.searchParams.set("part", "snippet");
searchUrl.searchParams.set("q", topic.query);
searchUrl.searchParams.set("type", "video");
searchUrl.searchParams.set("maxResults", "5");
searchUrl.searchParams.set("order", "date");
searchUrl.searchParams.set("key", YOUTUBE_API_KEY);

const searchResponse = await fetch(searchUrl.toString());
const searchData = await searchResponse.json();

const videoIds = (searchData.items || [])
  .map((item: any) => item.id?.videoId)
  .filter(Boolean);
Order: date ensures the most recent videos appear first, providing real-time sentiment data.

Step 2: Fetch Comments for Each Video

let totalInserted = 0;
let totalFetched = 0;

for (const videoId of videoIds) {
  const commentsUrl = new URL(
    "https://www.googleapis.com/youtube/v3/commentThreads"
  );
  commentsUrl.searchParams.set("part", "snippet");
  commentsUrl.searchParams.set("videoId", videoId);
  commentsUrl.searchParams.set("maxResults", "20");
  commentsUrl.searchParams.set("order", "relevance");
  commentsUrl.searchParams.set("key", YOUTUBE_API_KEY);

  const commentsResponse = await fetch(commentsUrl.toString());
  if (!commentsResponse.ok) {
    console.error(`Comments fetch failed for video ${videoId}`);
    await commentsResponse.text(); // consume body
    continue;
  }

  const commentsData = await commentsResponse.json();
  const comments = commentsData.items || [];
  totalFetched += comments.length;

  for (const comment of comments) {
    const snippet = comment.snippet?.topLevelComment?.snippet;
    if (!snippet) continue;

    const { error } = await supabase.from("posts").upsert(
      {
        topic_id,
        platform: "youtube",
        external_id: comment.id,
        author: snippet.authorDisplayName || "Anonymous",
        content: snippet.textDisplay?.replace(/<[^>]*>/g, "") || "",
        posted_at: snippet.publishedAt,
      },
      { onConflict: "platform,external_id" }
    );
    if (!error) totalInserted++;
  }
}

API Endpoints

Search Endpoint

URL: https://www.googleapis.com/youtube/v3/search Parameters:
ParameterValueDescription
partsnippetReturns title, description, thumbnails
q{topic.query}Search query
typevideoOnly return videos (not channels/playlists)
maxResults5Number of videos to retrieve
orderdateSort by upload date (newest first)
key{API_KEY}YouTube Data API key
curl "https://www.googleapis.com/youtube/v3/search?part=snippet&q=OpenAI&type=video&maxResults=5&order=date&key=YOUR_API_KEY"

Comments Endpoint

URL: https://www.googleapis.com/youtube/v3/commentThreads Parameters:
ParameterValueDescription
partsnippetReturns comment metadata
videoId{videoId}Video to fetch comments from
maxResults20Comments per video
orderrelevanceSort by relevance (top comments)
key{API_KEY}YouTube Data API key
curl "https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=dQw4w9WgXcQ&maxResults=20&order=relevance&key=YOUR_API_KEY"

Data Extraction

Comment Structure

YouTube comments are nested inside the response:
const snippet = comment.snippet?.topLevelComment?.snippet;

const author = snippet.authorDisplayName || "Anonymous";
const content = snippet.textDisplay?.replace(/<[^>]*>/g, "") || "";
const posted_at = snippet.publishedAt;
textDisplay contains HTML tags (e.g., <br>, <a>). Always strip HTML before storing:
content: snippet.textDisplay?.replace(/<[^>]*>/g, "") || ""

Comment Ordering

The API supports two ordering strategies:
// Returns top comments by engagement (likes, replies)
commentsUrl.searchParams.set("order", "relevance");
Relevance-sorted comments are better for sentiment analysis because:
  1. Higher engagement = more representative of viewer opinions
  2. Top comments are more likely to be substantive (not spam)
  3. Controversial opinions rise to the top (important for sentiment diversity)
Time-sorted comments may include spam, bots, and low-quality content.

Error Handling

API Error Detection

if (!searchResponse.ok) {
  const errText = await searchResponse.text().catch(() => "");
  console.error(
    `YouTube search API error [${searchResponse.status}]: ${errText.substring(0, 200)}`
  );
  return new Response(
    JSON.stringify({
      success: false,
      fetched: 0,
      inserted: 0,
      info: `YouTube API returned ${searchResponse.status}`,
    }),
    { headers: { ...corsHeaders, "Content-Type": "application/json" } }
  );
}

Common HTTP Status Codes

StatusMeaningSolution
200SuccessProcess data
400Bad RequestCheck query parameters
403ForbiddenAPI key invalid or quota exceeded
404Not FoundVideo doesn’t exist or has comments disabled
403 Forbidden usually means quota exceeded. YouTube allows 10,000 quota units/day for free tier.Cost breakdown:
  • search: 100 units
  • commentThreads: 1 unit
Fetching 5 videos + 20 comments each = 100 + (5 × 1) = 105 units per query

Rate Limits & Quota

YouTube Data API v3 Quota

Free Tier: 10,000 units/day
Paid Tier: Request quota increase via Google Cloud Console

Quota Costs

OperationCostNotes
search100 unitsPer request
commentThreads1 unitPer video
videos1 unitNot used in SENTi-radar

Daily Usage Calculation

// Current implementation:
// - 1 search (5 videos) = 100 units
// - 5 commentThreads = 5 units
// Total per topic = 105 units

// Max topics per day = 10,000 / 105 ≈ 95 topics
To analyze more topics, reduce maxResults for search or skip videos with comments disabled.

No Results Handling

if (videoIds.length === 0) {
  return new Response(
    JSON.stringify({
      success: true,
      fetched: 0,
      inserted: 0,
      info: "No YouTube videos found for this topic",
    }),
    { headers: { ...corsHeaders, "Content-Type": "application/json" } }
  );
}
This prevents the orchestrator from treating it as a failure. No videos is a valid outcome (e.g., query is too new or niche), not an error.Errors should only be returned for:
  • API authentication failures
  • Network timeouts
  • Malformed requests

Database Persistence

for (const comment of comments) {
  const snippet = comment.snippet?.topLevelComment?.snippet;
  if (!snippet) continue;

  const { error } = await supabase.from("posts").upsert(
    {
      topic_id,
      platform: "youtube",
      external_id: comment.id,
      author: snippet.authorDisplayName || "Anonymous",
      content: snippet.textDisplay?.replace(/<[^>]*>/g, "") || "",
      posted_at: snippet.publishedAt,
    },
    { onConflict: "platform,external_id" }
  );
  if (!error) totalInserted++;
}

Response Format

{
  "success": true,
  "fetched": 87,
  "inserted": 85,
  "info": "Fetched comments from 5 videos"
}

Response Fields

  • success: true if API calls succeeded (even if no comments found)
  • fetched: Total comments retrieved from YouTube
  • inserted: Comments successfully saved (may be less than fetched due to duplicates)

Environment Setup

1. Create YouTube API Key

1

Go to Google Cloud Console

2

Create or select project

Click Select a projectNew Project
3

Enable YouTube Data API v3

APIs & ServicesEnable APIs and Services → Search “YouTube Data API v3” → Enable
4

Create API Key

CredentialsCreate CredentialsAPI Key
5

Restrict API Key (recommended)

Click Restrict KeyAPI restrictions → Select “YouTube Data API v3”

2. Configure Environment

YOUTUBE_API_KEY=AIzaSyBxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your_service_key

Testing

supabase functions serve fetch-youtube --env-file .env

curl -X POST http://localhost:54321/functions/v1/fetch-youtube \
  -H "Authorization: Bearer ${SUPABASE_SERVICE_ROLE_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"topic_id": "your-topic-uuid"}'

Common Issues

Cause: Daily quota limit (10,000 units) exceededSolutions:
  1. Wait until quota resets (midnight Pacific Time)
  2. Request quota increase in Google Cloud Console
  3. Reduce maxResults in search and commentThreads
  4. Implement caching to avoid redundant API calls
Cause: Video creator disabled commentsSolution: The code already handles this gracefully:
if (!commentsResponse.ok) {
  console.error(`Comments fetch failed for video ${videoId}`);
  await commentsResponse.text(); // consume body
  continue; // Skip to next video
}
Causes:
  1. Query has no YouTube videos (check youtube.com/results manually)
  2. All videos have comments disabled
  3. Videos are age-restricted or private
Debug:
# Test search manually
curl "https://www.googleapis.com/youtube/v3/search?part=snippet&q=YOUR_QUERY&type=video&key=${YOUTUBE_API_KEY}" | jq '.items | length'
Problem: Raw comment text includes <br>, <a> tagsSolution: Already implemented in code:
content: snippet.textDisplay?.replace(/<[^>]*>/g, "") || ""

Optimization Tips

// Change maxResults to reduce quota consumption
searchUrl.searchParams.set("maxResults", "3");  // Was 5
commentsUrl.searchParams.set("maxResults", "10"); // Was 20

// New quota: 100 + (3 × 1) = 103 units per topic
// Max topics per day: 10,000 / 103 ≈ 97
// Fetch all video comments in parallel instead of sequentially
const commentPromises = videoIds.map(videoId => 
  fetch(buildCommentsUrl(videoId))
);
const commentResults = await Promise.allSettled(commentPromises);

// Reduces total latency by ~70%
// Store video IDs in database to avoid re-searching
await supabase.from("youtube_cache").insert({
  topic_id,
  video_ids: videoIds,
  cached_at: new Date().toISOString()
});

// Check cache before calling search API
const { data: cached } = await supabase
  .from("youtube_cache")
  .select("video_ids")
  .eq("topic_id", topicId)
  .gte("cached_at", oneHourAgo)
  .single();

Next Steps

Data Sources Overview

Understand the complete data pipeline

Sentiment Analysis

How YouTube comments are analyzed

Build docs developers (and LLMs) love