Threat Classification Pipeline

Overview

Every news item passes through a three-stage classification pipeline that provides instant results while progressively refining threat assessments using ML and LLM:

Keyword classifier (instant, source: 'keyword') — ~120 threat keywords across 5 severity tiers
Browser-side ML (async, source: 'ml') — Transformers.js NER + sentiment analysis
LLM classifier (batched async, source: 'llm') — Groq Llama 3.1 8B or Ollama local

The UI is never blocked waiting for AI. Users see keyword results instantly, with ML/LLM refinements arriving within seconds and persisting for all subsequent visitors.

Stage 1: Keyword Classifier

Pattern-matches against ~120 threat keywords organized by severity tier and event category.

Severity Tiers

CRITICAL (confidence 0.9)

Existential threats and major escalation:Military/Conflict:

nuclear strike, nuclear attack, nuclear war
invasion, declaration of war, declares war
all-out war, full-scale war
martial law, coup, coup attempt
genocide, ethnic cleansing
massive strikes, military strikes, retaliatory strikes

Iran-specific (high geopolitical priority):

attack iran, attacks iran, strikes iran
war with iran, war on iran
iran retaliates, iran strikes, iran attacks

WMD:

chemical attack, biological attack, dirty bomb

Health:

pandemic declared, health emergency

Military alliance:

nato article 5

Disaster:

nuclear meltdown, evacuation order

Examples:

“Russia invades Baltic states” → critical: conflict
“Iran launches retaliatory strikes” → critical: military
“NATO invokes Article 5” → critical: military

HIGH (confidence 0.8)

Active conflict and severe threats:Conflict:

war, armed conflict
airstrike, drone strike, bombing, shelling
casualties, killed in
strike on, attack on, launches attack

Military:

missile, missile launch, missiles fired
troops deployed, military escalation
ground offensive, military operation
ballistic missile, cruise missile

Terrorism:

hostage, terrorist, terror attack, assassination

Cyber:

cyber attack, ransomware, data breach

Economic:

sanctions, embargo

Disaster:

earthquake, tsunami, hurricane, typhoon

Compound escalation: HIGH military/conflict + critical geopolitical target → escalated to CRITICALExample: “US and Israel strikes on Iran” → critical: military (escalation logic)Source: src/services/threat-classifier.ts:329-337

MEDIUM (confidence 0.7)

Political instability and infrastructure disruption:

protest, riot, unrest, demonstration
military exercise, naval exercise
arms deal, weapons sale
diplomatic crisis, ambassador recalled, expel diplomats
trade war, tariff, recession, inflation
market crash
flood, wildfire, volcano, eruption
outbreak, epidemic
oil spill, pipeline explosion
blackout, power outage, internet outage
derailment

LOW (confidence 0.6)

Diplomatic activity and low-intensity events:

election, vote, referendum
summit, treaty, agreement, negotiation
talks, peacekeeping, humanitarian aid
ceasefire, peace treaty
climate change, emissions, pollution
vaccine, vaccination, disease, virus
interest rate, gdp, unemployment, regulation

INFO (confidence 0.3)

General news with no specific threat classification.Exclusions: Headlines containing lifestyle/entertainment keywords are auto-classified as INFO to prevent false positives:

protein, couples, relationship, dating
diet, fitness, recipe, cooking
shopping, fashion, celebrity, movie
tv show, sports, game, concert
strikes deal, strikes agreement (not military strikes)

Event Categories

conflict

Wars, battles, armed clashes

protest

Civil unrest, demonstrations

military

Troop movements, exercises

terrorism

Attacks, hostage situations

cyber

Hacking, data breaches

disaster

Natural disasters, accidents

diplomatic

Treaties, summits, negotiations

economic

Sanctions, market events

health

Pandemics, outbreaks

environmental

Climate, pollution, spills

infrastructure

Outages, pipeline explosions

crime

Assassinations, organized crime

tech

Tech-specific events (variant)

general

Uncategorized news

Keyword Matching Logic

wordBoundary

boolean

default:"true"

Short keywords (≤5 chars) use \b word boundaries to prevent false positives:

war matches “war in Ukraine” but not “award ceremony”
riot matches “riot police” but not “patriot”
hack matches “data hack” but not “hackathon”

Short keyword list: war, coup, ban, vote, riot, hack, talks, ipo, gdp, virus, disease, flood, strikes

trailingBoundary

boolean

Iran-specific keywords use trailing boundary only (allow prefix matches):

attack iran uses (?![\w-]) instead of \b..\b
Prevents hyphen breaks: “US-Iran tensions” still matches

Trailing boundary keywords: All Iran-specific phrases from CRITICAL tier

regexCache

Map<string, RegExp>

Compiled regexes are cached in a Map to avoid recompiling on every headline (10-15x performance improvement).

Source: src/services/threat-classifier.ts:286-315

Variant-Specific Keywords

The Tech Monitor variant includes additional keywords for tech industry threats: High:

major outage, global outage, service down
zero-day, critical vulnerability, supply chain attack
mass layoff

Medium:

outage, breach, hack, vulnerability
layoff, layoffs, antitrust, monopoly
ban, shutdown

Low:

ipo, funding, acquisition, merger
launch, release, update, partnership
startup, ai model, open source

Source: src/services/threat-classifier.ts:241-276

Stage 2: Browser-Side ML

Transformers.js runs Named Entity Recognition (NER), sentiment analysis, and topic classification entirely in the browser:

models

array

Xenova/bert-base-NER — entity extraction
Xenova/distilbert-base-uncased-finetuned-sst-2-english — sentiment
Topic classification model (custom fine-tuned)

Loading: ONNX models are downloaded on first use and cached in browser IndexedDB.

optIn

boolean

default:"false"

User control: “Browser Local Model” toggle in AI Flow settings. When disabled:

ML worker is never initialized
No ONNX model downloads
No WebGL memory allocation
Keyword classifier remains active

Toggle propagates dynamically — enabling it mid-session initializes the worker immediately.

confidence

number

default:"0.7-0.85"

ML confidence is typically lower than LLM but higher than keyword-only classification.

Source: src/services/ml-worker.ts

Stage 3: LLM Classifier

Headlines are collected into a batch queue and fired as parallel classifyEvent RPCs:

Batching Configuration

BATCH_SIZE

number

default:"20"

Max headlines per batch.

BATCH_DELAY_MS

number

default:"500"

Wait time before flushing partial batch (if fewer than 20 items).

STAGGER_BASE_MS

number

default:"2100"

Base delay between API requests to prevent rate limiting.

STAGGER_JITTER_MS

number

default:"200"

Random jitter (±200ms) added to stagger timing.

MIN_GAP_MS

number

default:"2000"

Minimum gap between requests enforced.

MAX_RETRIES

number

default:"2"

Failed jobs are retried up to 2 times before dropping.

MAX_QUEUE_LENGTH

number

default:"100"

Queue is capped at 100 items. Excess classifications are dropped with console warning.

Error Handling

429 Rate Limit

Batch queue pauses for 60 seconds
Failed job increments attempt counter and is requeued (if attempts < MAX_RETRIES)
Remaining jobs in batch are requeued WITHOUT burning attempts
Console warning: [Classify] 429 — pausing AI classification for 60s

500+ Server Error

Batch queue pauses for 30 seconds
Same retry logic as 429
Prevents wasting API quota on transient failures
Console warning: [Classify] 500 — pausing AI classification for 30s

Network Error

Individual job fails (no queue pause)
Job is retried up to MAX_RETRIES
After max retries, returns null (keyword classification remains)

Source: src/services/threat-classifier.ts:412-495

LLM Provider Configuration

const GROQ_CONFIG = {
  model: 'llama-3.1-8b-instant',
  temperature: 0,
  maxTokens: 50,
  timeout: 5000
};

Redis Caching

LLM results are cached with 24h TTL to prevent redundant API calls:

const cacheKey = `classify:${hashHeadline(title)}`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);

const result = await classifyClient.classifyEvent({ title, ... });
await redis.setex(cacheKey, 86400, JSON.stringify(result));
return result;

Deduplication: Same headline viewed by 1,000 concurrent users triggers exactly one LLM call.

Classification Override Logic

When multiple sources provide results, the highest confidence wins:

function selectBestClassification(
  keyword: ThreatClassification,
  ml: ThreatClassification | null,
  llm: ThreatClassification | null
): ThreatClassification {
  const candidates = [keyword, ml, llm].filter(Boolean) as ThreatClassification[];
  return candidates.reduce((best, current) =>
    current.confidence > best.confidence ? current : best
  );
}

Result tagging: Each classification carries its source tag (keyword, ml, llm) so downstream consumers can weight confidence accordingly.

Aggregate Threat for Clusters

News clusters (multiple sources reporting same story) aggregate threat levels:

export function aggregateThreats(
  items: Array<{ threat?: ThreatClassification; tier?: number }>
): ThreatClassification {
  // Level = max across items
  const maxLevel = Math.max(...items.map(i => THREAT_PRIORITY[i.threat!.level]));

  // Category = most frequent
  const catCounts = new Map<EventCategory, number>();
  for (const item of withThreat) {
    const cat = item.threat!.category;
    catCounts.set(cat, (catCounts.get(cat) ?? 0) + 1);
  }
  const topCat = [...catCounts.entries()].sort((a, b) => b[1] - a[1])[0][0];

  // Confidence = weighted avg by source tier (lower tier = higher weight)
  let weightedSum = 0;
  let weightTotal = 0;
  for (const item of withThreat) {
    const weight = item.tier ? (6 - Math.min(item.tier, 5)) : 1;
    weightedSum += item.threat!.confidence * weight;
    weightTotal += weight;
  }

  return {
    level: maxLevel,
    category: topCat,
    confidence: weightTotal > 0 ? weightedSum / weightTotal : 0.5,
    source: 'keyword',
  };
}

Source: src/services/threat-classifier.ts:521-570

Threat Color Mapping

Threat levels are color-coded with CSS variables for theme support:

critical

Red --threat-critical

high

Orange --threat-high

medium

Yellow --threat-medium

low

Green --threat-low

info

Blue --threat-info

export function getThreatColor(level: ThreatLevel): string {
  return getCSSColor(THREAT_VAR_MAP[level] || '--text-dim');
}

Runtime reads: Use getThreatColor() instead of static THREAT_COLORS object to support light/dark theme switching.

Example Classifications

{
  "level": "critical",
  "category": "military",
  "confidence": 0.9,
  "source": "keyword",
  "matchedKeyword": "nuclear strike"
}

Key Files

src/services/threat-classifier.ts — Main classification engine
src/services/ml-worker.ts — Browser-side Transformers.js ML
api/intelligence/classify-event.ts — LLM classification handler
src/components/ThreatBadge.tsx — UI threat level indicators

Get Started

Core Features

Data & Intelligence

Variants

Configuration

Development

Deployment

​Overview

​Stage 1: Keyword Classifier

​Severity Tiers

​Event Categories

conflict

protest

military

terrorism

cyber

disaster

diplomatic

economic

health

environmental

infrastructure

crime

tech

general

​Keyword Matching Logic

​Variant-Specific Keywords

​Stage 2: Browser-Side ML

​Stage 3: LLM Classifier

​Batching Configuration

​Error Handling

​LLM Provider Configuration

​Redis Caching

​Classification Override Logic

​Aggregate Threat for Clusters

​Threat Color Mapping

critical

high

medium

low

info

​Example Classifications

​Key Files

Build docs developers (and LLMs) love

Overview

Stage 1: Keyword Classifier

Severity Tiers

Event Categories

Keyword Matching Logic

Variant-Specific Keywords

Stage 2: Browser-Side ML

Stage 3: LLM Classifier

Batching Configuration

Error Handling

LLM Provider Configuration

Redis Caching

Classification Override Logic

Aggregate Threat for Clusters

Threat Color Mapping

Example Classifications

Key Files