Skip to main content

Architecture

SENTi-radar aggregates sentiment data from multiple social media and content platforms in real-time. The data collection pipeline uses a combination of official APIs, web scraping via Scrape.do, and intelligent fallback strategies to ensure reliable data delivery.

Supported Platforms

The platform currently supports three primary data sources:

Twitter/X

Real-time tweets and discussions via Scrape.do rendering

Reddit

Posts and comments from subreddits using JSON API

YouTube

Video comments via YouTube Data API v3

Data Flow

When a new topic is analyzed, SENTi-radar orchestrates data collection through the analyze-topic edge function:
// Orchestrator flow (supabase/functions/analyze-topic/index.ts)

// 1. Create or retrieve topic
const { data: newTopic } = await supabase
  .from("topics")
  .insert({
    title: topicTitle,
    hashtag: topicHashtag,
    query,
    is_trending: false,
  })
  .select()
  .single();

// 2. Fetch from Twitter/X (with YouTube fallback)
await fetch(`${supabaseUrl}/functions/v1/fetch-twitter`, {
  method: "POST",
  body: JSON.stringify({ topic_id: topicId }),
});

// 3. Fetch from Reddit
await fetch(`${supabaseUrl}/functions/v1/fetch-reddit`, {
  method: "POST",
  body: JSON.stringify({ topic_id: topicId }),
});

// 4. Fetch from YouTube
await fetch(`${supabaseUrl}/functions/v1/fetch-youtube`, {
  method: "POST",
  body: JSON.stringify({ topic_id: topicId }),
});

// 5. Analyze sentiment
await fetch(`${supabaseUrl}/functions/v1/analyze-sentiment`, {
  method: "POST",
  body: JSON.stringify({ topic_id: topicId }),
});

Scrape.do Integration

Scrape.do is a premium web scraping API that handles JavaScript rendering, residential proxies, and CAPTCHA bypass for platforms that block traditional scraping.
The scrapeDoProvider.ts service provides a unified interface for scraping:
// Build Scrape.do API URL
export function buildApiUrl(
  token: string,
  targetUrl: string,
  options: ScrapeDoOptions = {}
): string {
  const params = new URLSearchParams();
  params.set("token", token);
  params.set("url", targetUrl);
  if (options.render !== false) params.set("render", "true");
  if (options.super) params.set("super", "true");
  if (options.waitUntil) params.set("waitUntil", options.waitUntil);
  if (options.geoCode) params.set("geoCode", options.geoCode);
  return `https://api.scrape.do?${params.toString()}`;
}

Scrape.do Options

OptionTypeDefaultDescription
renderbooleantrueEnable JavaScript rendering for SPAs
superbooleanfalseUse residential/mobile proxies
waitUntilstringnetworkidle0Wait strategy: networkidle0, networkidle2, load, domcontentloaded
geoCodestring-ISO country code for geo-targeting (e.g., us, gb, in)

Fallback Strategy

The Twitter fetcher implements a sophisticated multi-tier fallback:
1

Primary: Scrape.do for Twitter/X

Scrapes live tweets with JavaScript rendering and proxy rotation
2

Secondary: Scrape.do for Reddit

Fetches Reddit posts in parallel if X scraping succeeds
3

Tertiary: Parallel.ai Social Search

Uses AI-powered social search API when Scrape.do is unavailable
4

Final: YouTube API

Falls back to YouTube comments when all social platforms fail
5

Algorithmic Simulation

Generates synthetic data only when all sources are unavailable
The algorithmic fallback is a last resort and generates template-based synthetic posts. It’s only triggered when all real data sources fail.

Data Normalization

All scraped posts are normalized into a consistent format before database insertion:
interface ScrapedPost {
  id: string;               // Unique identifier
  text: string;             // Post content
  author: string;           // Username with @ prefix
  platform: string;         // Source platform: 'x', 'reddit', 'youtube'
  url: string;              // Source URL
  postedAt: string;         // ISO 8601 timestamp
}

// Database insertion
await supabase.from("posts").upsert(
  {
    topic_id,
    platform: post.platform,
    external_id: post.id,
    author: post.author,
    content: post.text,
    posted_at: post.postedAt,
  },
  { onConflict: "platform,external_id" }
);

Environment Variables

Required configuration for data sources:
# Scrape.do API token (required for X and Reddit)
SCRAPE_DO_TOKEN=your_scrape_do_token

# YouTube Data API v3 key (required for YouTube)
YOUTUBE_API_KEY=your_youtube_api_key

# Parallel.ai API key (optional fallback)
PARALLEL_API_KEY=your_parallel_api_key

# Supabase credentials
SUPABASE_URL=your_supabase_url
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key

Error Handling

Each data source implements graceful degradation:
const scrapeStatus: "ok" | "blocked" | "quota" | "no_token" | "error" = "error";

if (res.status === 402 || res.status === 429) {
  scrapeStatus = "quota";  // API quota exceeded
} else if (res.status === 403 || res.status === 407) {
  scrapeStatus = "blocked"; // Proxy/IP blocked
} else if (res.ok) {
  scrapeStatus = "ok";      // Success
}

return {
  success: scrapeStatus === "ok",
  fetched: posts.length,
  inserted: insertedCount,
  scrape_status: scrapeStatus,
};
The system never returns HTTP 500 errors to the orchestrator. Instead, it returns success: false with diagnostic info, allowing the pipeline to continue with other sources.

Next Steps

Twitter/X Integration

Learn how X scraping works

Reddit Integration

Understand Reddit data collection

YouTube Integration

Configure YouTube API access

Build docs developers (and LLMs) love