What is Claim Normalization?
Claim normalization is the process of transforming noisy, context-dependent social media posts into clean, verifiable factual statements that can be independently fact-checked. CheckThat AI extracts the central claim from text while removing extraneous information, opinions, and context-specific references.Core Objective: Convert raw social media content into decontextualized, stand-alone claims that professional fact-checkers can verify using reliable sources.
The Transformation Process
CheckThat AI’s ClaimNorm agent follows a systematic 4-step process to normalize claims:Step 1: Sentence Splitting and Context Creation
- Split the post into individual sentences
- Create context for each sentence using 2 preceding and 2 following sentences
- Build contextual understanding for accurate extraction
Step 2: Selection
For each sentence:- Discard sentences with no verifiable information
- Rewrite sentences containing both verifiable and unverifiable information (retain only verifiable parts)
- Keep sentences containing only verifiable information
Step 3: Disambiguation
Resolve two types of ambiguity:Referential Ambiguity
Referential Ambiguity
Unclear references like “They,” “the policy,” or “next year” that require context to understand.Example: “They will update the policy next year” → Ambiguous without knowing who “They” refers to
Structural Ambiguity
Structural Ambiguity
Grammatical structures allowing multiple interpretations.Example: “AI has advanced renewable energy and sustainable agriculture at Company A and Company B” could mean:
- (1) AI advanced both at both companies, or
- (2) AI advanced renewable energy at Company A, sustainable agriculture at Company B
Step 4: Decomposition
- Identify all specific, verifiable propositions
- Ensure each proposition is decontextualized (self-contained and understandable in isolation)
- Create the simplest possible discrete units of information
- If no verifiable claims exist, return an extractive summary of the central idea
Prompting Strategies
CheckThat AI uses advanced prompting techniques to guide the normalization process:System Prompt Structure
The full system prompt is defined in
api/_utils/prompts.py:3-53 and includes detailed step-by-step instructions for the normalization process.Chain-of-Thought Trigger
Few-Shot Examples
CheckThat AI uses few-shot learning with both standard and chain-of-thought examples:Real Normalization Examples
Example 1: Health Misinformation
Original Post:Normalized Claim:Transformation: Removed timeline details, focused on central verifiable claim
Example 2: Celebrity Content
Original Post:Normalized Claim:Transformation: Added clarifying context (“late actor”), removed subjective assessment (“priceless”, “focus on speed”)
Quality Guidelines
Normalized claims must meet these criteria:Important Named Entity Handling
Example with Named Entities
API Integration
The normalization process is integrated throughout the CheckThat AI API:Implementation Example
Next Steps
After normalization, claims can be further improved through:Refinement Pipeline
Iteratively improve claim quality using self-refine and cross-refine algorithms
Evaluation Metrics
Assess claim quality using G-Eval and other metrics