Skip to main content

Overview

Every news item passes through a three-stage classification pipeline that provides instant results while progressively refining threat assessments using ML and LLM:
  1. Keyword classifier (instant, source: 'keyword') — ~120 threat keywords across 5 severity tiers
  2. Browser-side ML (async, source: 'ml') — Transformers.js NER + sentiment analysis
  3. LLM classifier (batched async, source: 'llm') — Groq Llama 3.1 8B or Ollama local
The UI is never blocked waiting for AI. Users see keyword results instantly, with ML/LLM refinements arriving within seconds and persisting for all subsequent visitors.

Stage 1: Keyword Classifier

Pattern-matches against ~120 threat keywords organized by severity tier and event category.

Severity Tiers

Existential threats and major escalation:Military/Conflict:
  • nuclear strike, nuclear attack, nuclear war
  • invasion, declaration of war, declares war
  • all-out war, full-scale war
  • martial law, coup, coup attempt
  • genocide, ethnic cleansing
  • massive strikes, military strikes, retaliatory strikes
Iran-specific (high geopolitical priority):
  • attack iran, attacks iran, strikes iran
  • war with iran, war on iran
  • iran retaliates, iran strikes, iran attacks
WMD:
  • chemical attack, biological attack, dirty bomb
Health:
  • pandemic declared, health emergency
Military alliance:
  • nato article 5
Disaster:
  • nuclear meltdown, evacuation order
Examples:
  • “Russia invades Baltic states” → critical: conflict
  • “Iran launches retaliatory strikes” → critical: military
  • “NATO invokes Article 5” → critical: military
Active conflict and severe threats:Conflict:
  • war, armed conflict
  • airstrike, drone strike, bombing, shelling
  • casualties, killed in
  • strike on, attack on, launches attack
Military:
  • missile, missile launch, missiles fired
  • troops deployed, military escalation
  • ground offensive, military operation
  • ballistic missile, cruise missile
Terrorism:
  • hostage, terrorist, terror attack, assassination
Cyber:
  • cyber attack, ransomware, data breach
Economic:
  • sanctions, embargo
Disaster:
  • earthquake, tsunami, hurricane, typhoon
Compound escalation: HIGH military/conflict + critical geopolitical target → escalated to CRITICALExample: “US and Israel strikes on Iran” → critical: military (escalation logic)Source: src/services/threat-classifier.ts:329-337
Political instability and infrastructure disruption:
  • protest, riot, unrest, demonstration
  • military exercise, naval exercise
  • arms deal, weapons sale
  • diplomatic crisis, ambassador recalled, expel diplomats
  • trade war, tariff, recession, inflation
  • market crash
  • flood, wildfire, volcano, eruption
  • outbreak, epidemic
  • oil spill, pipeline explosion
  • blackout, power outage, internet outage
  • derailment
Diplomatic activity and low-intensity events:
  • election, vote, referendum
  • summit, treaty, agreement, negotiation
  • talks, peacekeeping, humanitarian aid
  • ceasefire, peace treaty
  • climate change, emissions, pollution
  • vaccine, vaccination, disease, virus
  • interest rate, gdp, unemployment, regulation
General news with no specific threat classification.Exclusions: Headlines containing lifestyle/entertainment keywords are auto-classified as INFO to prevent false positives:
  • protein, couples, relationship, dating
  • diet, fitness, recipe, cooking
  • shopping, fashion, celebrity, movie
  • tv show, sports, game, concert
  • strikes deal, strikes agreement (not military strikes)

Event Categories

conflict

Wars, battles, armed clashes

protest

Civil unrest, demonstrations

military

Troop movements, exercises

terrorism

Attacks, hostage situations

cyber

Hacking, data breaches

disaster

Natural disasters, accidents

diplomatic

Treaties, summits, negotiations

economic

Sanctions, market events

health

Pandemics, outbreaks

environmental

Climate, pollution, spills

infrastructure

Outages, pipeline explosions

crime

Assassinations, organized crime

tech

Tech-specific events (variant)

general

Uncategorized news

Keyword Matching Logic

wordBoundary
boolean
default:"true"
Short keywords (≤5 chars) use \b word boundaries to prevent false positives:
  • war matches “war in Ukraine” but not “award ceremony”
  • riot matches “riot police” but not “patriot”
  • hack matches “data hack” but not “hackathon”
Short keyword list: war, coup, ban, vote, riot, hack, talks, ipo, gdp, virus, disease, flood, strikes
trailingBoundary
boolean
Iran-specific keywords use trailing boundary only (allow prefix matches):
  • attack iran uses (?![\w-]) instead of \b..\b
  • Prevents hyphen breaks: “US-Iran tensions” still matches
Trailing boundary keywords: All Iran-specific phrases from CRITICAL tier
regexCache
Map<string, RegExp>
Compiled regexes are cached in a Map to avoid recompiling on every headline (10-15x performance improvement).
Source: src/services/threat-classifier.ts:286-315

Variant-Specific Keywords

The Tech Monitor variant includes additional keywords for tech industry threats: High:
  • major outage, global outage, service down
  • zero-day, critical vulnerability, supply chain attack
  • mass layoff
Medium:
  • outage, breach, hack, vulnerability
  • layoff, layoffs, antitrust, monopoly
  • ban, shutdown
Low:
  • ipo, funding, acquisition, merger
  • launch, release, update, partnership
  • startup, ai model, open source
Source: src/services/threat-classifier.ts:241-276

Stage 2: Browser-Side ML

Transformers.js runs Named Entity Recognition (NER), sentiment analysis, and topic classification entirely in the browser:
models
array
  • Xenova/bert-base-NER — entity extraction
  • Xenova/distilbert-base-uncased-finetuned-sst-2-english — sentiment
  • Topic classification model (custom fine-tuned)
Loading: ONNX models are downloaded on first use and cached in browser IndexedDB.
optIn
boolean
default:"false"
User control: “Browser Local Model” toggle in AI Flow settings. When disabled:
  • ML worker is never initialized
  • No ONNX model downloads
  • No WebGL memory allocation
  • Keyword classifier remains active
Toggle propagates dynamically — enabling it mid-session initializes the worker immediately.
confidence
number
default:"0.7-0.85"
ML confidence is typically lower than LLM but higher than keyword-only classification.
Source: src/services/ml-worker.ts

Stage 3: LLM Classifier

Headlines are collected into a batch queue and fired as parallel classifyEvent RPCs:

Batching Configuration

BATCH_SIZE
number
default:"20"
Max headlines per batch.
BATCH_DELAY_MS
number
default:"500"
Wait time before flushing partial batch (if fewer than 20 items).
STAGGER_BASE_MS
number
default:"2100"
Base delay between API requests to prevent rate limiting.
STAGGER_JITTER_MS
number
default:"200"
Random jitter (±200ms) added to stagger timing.
MIN_GAP_MS
number
default:"2000"
Minimum gap between requests enforced.
MAX_RETRIES
number
default:"2"
Failed jobs are retried up to 2 times before dropping.
MAX_QUEUE_LENGTH
number
default:"100"
Queue is capped at 100 items. Excess classifications are dropped with console warning.

Error Handling

  • Batch queue pauses for 60 seconds
  • Failed job increments attempt counter and is requeued (if attempts < MAX_RETRIES)
  • Remaining jobs in batch are requeued WITHOUT burning attempts
  • Console warning: [Classify] 429 — pausing AI classification for 60s
  • Batch queue pauses for 30 seconds
  • Same retry logic as 429
  • Prevents wasting API quota on transient failures
  • Console warning: [Classify] 500 — pausing AI classification for 30s
  • Individual job fails (no queue pause)
  • Job is retried up to MAX_RETRIES
  • After max retries, returns null (keyword classification remains)
Source: src/services/threat-classifier.ts:412-495

LLM Provider Configuration

const GROQ_CONFIG = {
  model: 'llama-3.1-8b-instant',
  temperature: 0,
  maxTokens: 50,
  timeout: 5000
};

Redis Caching

LLM results are cached with 24h TTL to prevent redundant API calls:
const cacheKey = `classify:${hashHeadline(title)}`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);

const result = await classifyClient.classifyEvent({ title, ... });
await redis.setex(cacheKey, 86400, JSON.stringify(result));
return result;
Deduplication: Same headline viewed by 1,000 concurrent users triggers exactly one LLM call.

Classification Override Logic

When multiple sources provide results, the highest confidence wins:
function selectBestClassification(
  keyword: ThreatClassification,
  ml: ThreatClassification | null,
  llm: ThreatClassification | null
): ThreatClassification {
  const candidates = [keyword, ml, llm].filter(Boolean) as ThreatClassification[];
  return candidates.reduce((best, current) =>
    current.confidence > best.confidence ? current : best
  );
}
Result tagging: Each classification carries its source tag (keyword, ml, llm) so downstream consumers can weight confidence accordingly.

Aggregate Threat for Clusters

News clusters (multiple sources reporting same story) aggregate threat levels:
export function aggregateThreats(
  items: Array<{ threat?: ThreatClassification; tier?: number }>
): ThreatClassification {
  // Level = max across items
  const maxLevel = Math.max(...items.map(i => THREAT_PRIORITY[i.threat!.level]));

  // Category = most frequent
  const catCounts = new Map<EventCategory, number>();
  for (const item of withThreat) {
    const cat = item.threat!.category;
    catCounts.set(cat, (catCounts.get(cat) ?? 0) + 1);
  }
  const topCat = [...catCounts.entries()].sort((a, b) => b[1] - a[1])[0][0];

  // Confidence = weighted avg by source tier (lower tier = higher weight)
  let weightedSum = 0;
  let weightTotal = 0;
  for (const item of withThreat) {
    const weight = item.tier ? (6 - Math.min(item.tier, 5)) : 1;
    weightedSum += item.threat!.confidence * weight;
    weightTotal += weight;
  }

  return {
    level: maxLevel,
    category: topCat,
    confidence: weightTotal > 0 ? weightedSum / weightTotal : 0.5,
    source: 'keyword',
  };
}
Source: src/services/threat-classifier.ts:521-570

Threat Color Mapping

Threat levels are color-coded with CSS variables for theme support:

critical

Red --threat-critical

high

Orange --threat-high

medium

Yellow --threat-medium

low

Green --threat-low

info

Blue --threat-info
export function getThreatColor(level: ThreatLevel): string {
  return getCSSColor(THREAT_VAR_MAP[level] || '--text-dim');
}
Runtime reads: Use getThreatColor() instead of static THREAT_COLORS object to support light/dark theme switching.

Example Classifications

{
  "level": "critical",
  "category": "military",
  "confidence": 0.9,
  "source": "keyword",
  "matchedKeyword": "nuclear strike"
}

Key Files

  • src/services/threat-classifier.ts — Main classification engine
  • src/services/ml-worker.ts — Browser-side Transformers.js ML
  • api/intelligence/classify-event.ts — LLM classification handler
  • src/components/ThreatBadge.tsx — UI threat level indicators

Build docs developers (and LLMs) love