What is Claim Detection?
Claim detection is the process of identifying factual assertions within text that can be objectively verified as true or false. In the context of CheckThat AI, claim detection is the foundational step in transforming noisy social media posts into verifiable, normalized statements.Formal Definition
Claim: A statement or assertion that can be objectively verified as true or false based on empirical evidence or reality.Claims differ from opinions, questions, or statements that cannot be empirically tested.
Examples
| Text | Classification | Reasoning |
|---|---|---|
| ”The COVID-19 vaccine contains microchips” | Claim | Can be verified through scientific evidence |
| ”I think the government should lower taxes” | Opinion | Subjective preference, not verifiable |
| ”Climate change is the most important issue” | Opinion | Value judgment, not factual assertion |
| ”The Earth’s average temperature has increased by 1.1°C since 1880” | Claim | Verifiable with scientific data |
The Claim Detection Process
CheckThat AI implements a sophisticated multi-step approach to claim detection:Step 1: Sentence Splitting and Context Creation
The system begins by breaking down the input post into individual sentences and establishing contextual relationships:Step 2: Selection (Verifiability Assessment)
For each sentence, the system evaluates whether it contains verifiable information:Selection Criteria:
- Discard: Sentences with no verifiable information
- Rewrite: Sentences mixing verifiable and unverifiable content (retain only verifiable parts)
- Retain: Sentences that are fully verifiable
- Input: “I believe the government is hiding alien technology, which is terrible!”
- Verifiable Core: “The government is hiding alien technology”
- Removed: Subjective opinion (“I believe”), emotional reaction (“terrible”)
Step 3: Disambiguation
The system identifies and resolves two types of ambiguity:Referential Ambiguity
Occurs when pronouns or references are unclear:-
Ambiguous: “They will update the policy next year”
- Who is “They”?
- Which “policy”?
- Which “year”?
- Resolved: “The UK government will update immigration policy in 2026”
Structural Ambiguity
Occurs when grammar allows multiple interpretations:-
Ambiguous: “AI has advanced renewable energy and sustainable agriculture at Company A and Company B”
Two possible interpretations:
- AI advanced both areas at both companies
- AI advanced renewable energy at Company A and agriculture at Company B
- Resolution Standard: A group of readers must be able to agree on the correct interpretation based on available context
Critical Rule: If ambiguity cannot be resolved, the sentence is discarded—even if it contains some unambiguous, verifiable components.
Step 4: Decomposition
The final step extracts specific, verifiable propositions that are decontextualized: Decontextualized Requirements:- Self-contained: Can be understood in isolation
- Meaning-preserving: Interpretation matches the original when combined with question and context
- Minimal: Simplest possible discrete units of information
- Complex: “The new vaccine, which was developed by Moderna and approved last month, has been shown to reduce hospitalizations by 90% in clinical trials conducted across 12 countries”
-
Decomposed Claims:
- “Moderna developed a new vaccine”
- “The vaccine was approved last month”
- “Clinical trials showed 90% reduction in hospitalizations”
- “Trials were conducted across 12 countries”
Check-Worthiness Assessment
Not all claims are equally important to verify. CheckThat AI evaluates check-worthiness based on multiple criteria:Evaluation Dimensions
1. Verifiability
Question: Can this claim be fact-checked using reliable sources?- High: “The Eiffel Tower is 330 meters tall”
- Medium: “Most scientists agree on climate change”
- Low: “Everyone loves chocolate”
2. Likelihood of Being False
Question: How likely is this claim to be misleading or false?- High: “5G towers cause COVID-19”
- Medium: “Coffee cures cancer”
- Low: “Water is necessary for life”
3. Public Interest
Question: Is this claim relevant to public discourse?- High: “The president announced new tax policy”
- Medium: “Local school changes lunch menu”
- Low: “My neighbor painted their fence”
4. Potential Harm
Question: Could believing this false claim cause damage?- High: “Don’t evacuate during the hurricane”
- Medium: “This unregulated supplement treats diabetes”
- Low: “Bigfoot was spotted in the woods”
5. Check-Worthiness Score
Question: How urgent is it to fact-check this claim? Calculated from the above dimensions using G-Eval (see G-Eval).Implementation in CheckThat AI
System Prompt Engineering
The claim detection logic is encoded in the system prompt (sys_prompt in /home/daytona/workspace/source/api/_utils/prompts.py:3-53):
Processing Strategies
CheckThat AI supports multiple prompting approaches:Zero-Shot
Direct claim extraction without examples:Few-Shot
Learning from examples (seefew_shot_prompt in prompts.py:59-118):
Chain-of-Thought (CoT)
Step-by-step reasoning (seefew_shot_CoT_prompt in prompts.py:120-215):
Constraint Enforcement
The system enforces strict constraints during claim detection:Extraction Rules:
- Use only words from the original text (no inference)
- Maximum 25 words per claim
- Single sentence format
- Self-contained (no external context required)
- Preserve named entities exactly as they appear
- Maintain sentiment (negative claims stay negative)
- Extract, don’t summarize (no interpretation)
Relation to Claim Normalization
Claim detection is the first phase of the complete normalization pipeline:Integration Points
- Detection → Identifies which parts of the post contain claims
- Extraction → Pulls out the verifiable assertions
- Normalization → Transforms into standard form
- Evaluation → Assesses quality using G-Eval and METEOR
- Refinement → Iteratively improves using feedback (see Fact-Checking Pipeline)
Evaluation Criteria
Claim detection quality is measured using:G-Eval Criteria
FromSTATIC_EVAL_SPECS in /home/daytona/workspace/source/api/types/evals.py:25-50:
-
Verifiability and Self-Containment
- Contains verifiable factual assertions
- Self-contained without requiring additional context
-
Claim Centrality and Extraction Quality
- Captures central assertion from source
- Removes extraneous information
-
Conciseness and Clarity
- Straightforward, concise manner
- Significantly shorter than source
-
Check-Worthiness Alignment
- Meets standards for fact-verification
- Has public interest and potential impact
-
Factual Consistency
- Consistent with source material
- No hallucinations or distortions
References
Academic Literature
- CheckThat! Lab Papers: Annual proceedings from CLEF conferences
- Claim Detection: Hassan et al., “Toward Automated Fact-Checking” (2015)
- Check-Worthiness: Jaradat et al., “ClaimBuster: The First-ever End-to-end Fact-checking System” (2018)