Credibility Scoring

Overview

Every analysis generated by Argument Cartographer receives a “brutally honest” Credibility Score from 1-10. Unlike binary true/false ratings, this nuanced score reflects the complex reality of most debates.

Philosophy: Even well-argued topics rarely score above 8/10. This intentional harshness helps users understand that most real-world debates have legitimate complexity and imperfect evidence.

Scoring Algorithm

The credibility score is calculated using a weighted multi-factor algorithm:

Credibility Score = 
  Source Quality (30%) +
  Evidence Strength (30%) +
  Fallacy Penalty (30%) +
  Logical Coherence (10%)

Factor 1: Source Quality (30%)

Evaluates the diversity and reliability of sources used in the analysis.

Criteria
Scoring
Implementation

Positive factors:

Number of independent sources (8+ is ideal)
Domain diversity (different news outlets)
Trusted outlet presence (Reuters, BBC, AP, etc.)
Recency of sources (< 30 days)
Geographic diversity (multiple regions)

Negative factors:

Echo chamber (all sources from same outlet)
Outdated sources (> 1 year old)
Unreliable domains (tabloids, conspiracy sites)
Insufficient sourcing (< 3 sources)

Sources	Domain Diversity	Points Awarded
8+	High (7+ unique domains)	+3.0
5-7	Moderate (4-6 domains)	+2.0
3-4	Low (2-3 domains)	+1.0
1-2	Very Low (1 domain)	+0.5

Trusted outlet bonus: +0.5 if 50%+ are from Reuters, BBC, AP, Bloomberg, etc.

function calculateSourceQuality(sources: Source[]): number {
  const uniqueDomains = new Set(
    sources.map(s => new URL(s.url).hostname)
  ).size;
  
  const trustedCount = sources.filter(s => 
    TRUSTED_OUTLETS.includes(new URL(s.url).hostname)
  ).length;
  
  let score = 0;
  
  if (sources.length >= 8 && uniqueDomains >= 7) score = 3.0;
  else if (sources.length >= 5 && uniqueDomains >= 4) score = 2.0;
  else if (sources.length >= 3) score = 1.0;
  else score = 0.5;
  
  // Bonus for trusted sources
  if (trustedCount / sources.length >= 0.5) score += 0.5;
  
  return Math.min(3.0, score);
}

Factor 2: Evidence Strength (30%)

Assesses the quality and type of evidence backing claims.

Evidence Hierarchy
Scoring Logic

Strongest to Weakest:

Primary Research (3 points)
- Peer-reviewed studies
- Original data/statistics
- Direct experiments
- Government reports with data
Expert Testimony (2 points)
- Quotes from credentialed experts
- Academic analysis
- Professional organization statements
Secondary Analysis (1 point)
- Journalism synthesizing research
- Think tank reports
- Meta-analyses
Opinion/Speculation (0 points)
- Editorial opinions
- Anecdotal evidence
- Hypothetical scenarios

function calculateEvidenceStrength(blueprint: ArgumentNode[]): number {
  const evidenceNodes = blueprint.filter(n => n.type === 'evidence');
  
  let strengthScore = 0;
  let totalWeight = 0;
  
  for (const evidence of evidenceNodes) {
    const type = classifyEvidence(evidence.content, evidence.source);
    
    switch (type) {
      case 'primary-research':
        strengthScore += 3.0;
        break;
      case 'expert-testimony':
        strengthScore += 2.0;
        break;
      case 'secondary-analysis':
        strengthScore += 1.0;
        break;
      case 'opinion':
      default:
        strengthScore += 0;
    }
    
    totalWeight += 1;
  }
  
  return totalWeight > 0 ? (strengthScore / totalWeight) : 0;
}

function classifyEvidence(content: string, source: string): string {
  const lowerContent = content.toLowerCase();
  const lowerSource = source.toLowerCase();
  
  // Check for research indicators
  if (
    lowerContent.includes('study found') ||
    lowerContent.includes('research shows') ||
    lowerSource.includes('nih.gov') ||
    lowerSource.includes('.edu') ||
    lowerSource.includes('nature.com')
  ) {
    return 'primary-research';
  }
  
  // Check for expert quotes
  if (
    lowerContent.includes('dr.') ||
    lowerContent.includes('professor') ||
    lowerContent.includes('expert')
  ) {
    return 'expert-testimony';
  }
  
  // Default to secondary
  return 'secondary-analysis';
}

Factor 3: Fallacy Penalty (30%)

Deducts points based on detected logical fallacies.

Penalty Scale
Implementation

Fallacy Severity	Points Deducted	Max Penalty
Critical	-1.0 each	-3.0 total
Major	-0.5 each	-2.0 total
Minor	-0.2 each	-1.0 total

Confidence weighting: Penalties are multiplied by confidence score

90%+ confidence: Full penalty
70-89%: 75% of penalty
50-69%: 50% of penalty
< 50%: 25% of penalty

function calculateFallacyPenalty(fallacies: DetectedFallacy[]): number {
  let totalPenalty = 0;
  
  for (const fallacy of fallacies) {
    let basePenalty = 0;
    
    switch (fallacy.severity) {
      case 'Critical':
        basePenalty = -1.0;
        break;
      case 'Major':
        basePenalty = -0.5;
        break;
      case 'Minor':
        basePenalty = -0.2;
        break;
    }
    
    // Weight by confidence
    const confidenceMultiplier = fallacy.confidence;
    totalPenalty += basePenalty * confidenceMultiplier;
  }
  
  // Cap maximum penalty
  return Math.max(-3.0, totalPenalty);
}

Factor 4: Logical Coherence (10%)

Evaluates internal consistency and argument structure.

Criteria
Scoring

Positive indicators:

Clear thesis-claim-evidence links
Both sides represented fairly
Claims supported by evidence (not orphaned)
Logical progression of arguments

Negative indicators:

Orphaned claims (no evidence)
One-sided analysis (no counterarguments)
Circular reasoning in structure
Disconnected evidence

function calculateLogicalCoherence(blueprint: ArgumentNode[]): number {
  let score = 1.0; // Start at max
  
  const claims = blueprint.filter(n => 
    n.type === 'claim' || n.type === 'counterclaim'
  );
  
  const evidence = blueprint.filter(n => n.type === 'evidence');
  
  // Penalty for claims without evidence
  const orphanedClaims = claims.filter(
    c => !evidence.some(e => e.parentId === c.id)
  );
  
  if (orphanedClaims.length > claims.length * 0.3) {
    score -= 0.3; // More than 30% orphaned
  }
  
  // Penalty for one-sided analysis
  const forClaims = claims.filter(c => c.side === 'for');
  const againstClaims = claims.filter(c => c.side === 'against');
  
  const ratio = Math.min(forClaims.length, againstClaims.length) / 
                Math.max(forClaims.length, againstClaims.length);
  
  if (ratio < 0.3) {
    score -= 0.4; // Very one-sided
  }
  
  return Math.max(0, score);
}

Final Score Calculation

function calculateCredibilityScore(analysis: AnalysisResult): number {
  const sourceScore = calculateSourceQuality(analysis.sources);
  const evidenceScore = calculateEvidenceStrength(analysis.blueprint);
  const fallacyPenalty = calculateFallacyPenalty(analysis.fallacies);
  const coherenceScore = calculateLogicalCoherence(analysis.blueprint);
  
  // Weighted sum
  const rawScore = 
    (sourceScore * 0.30) +
    (evidenceScore * 0.30) +
    (fallacyPenalty * 0.30) + // Note: This is negative
    (coherenceScore * 0.10);
  
  // Normalize to 1-10 scale
  const normalized = (rawScore + 3) / 1.3; // Shift and scale
  
  // Clamp and round
  return Math.max(1, Math.min(10, Math.round(normalized)));
}

The algorithm is designed to be harsh - a score of 7-8 represents an excellent analysis. Scores of 9-10 are exceptionally rare and require near-perfect sourcing and zero fallacies.

Score Interpretation

9-10: Exceptional
7-8: Strong
5-6: Moderate
3-4: Weak
1-2: Very Weak

Characteristics:

8+ diverse, trusted sources
Strong primary evidence throughout
Zero or only minor fallacies
Perfect logical structure
Both sides thoroughly represented

Example: Academic meta-analysis on climate change science with peer-reviewed sources, data-backed claims, and comprehensive coverage of all viewpoints.Rarity: < 5% of analyses

UI Presentation

Credibility scores are displayed prominently:

const CredibilityBadge = ({ score }: { score: number }) => {
  const getColor = (score: number) => {
    if (score >= 8) return 'bg-emerald-500 text-white';
    if (score >= 6) return 'bg-blue-500 text-white';
    if (score >= 4) return 'bg-yellow-500 text-black';
    if (score >= 2) return 'bg-orange-500 text-white';
    return 'bg-red-500 text-white';
  };
  
  const getLabel = (score: number) => {
    if (score >= 8) return 'Strong';
    if (score >= 6) return 'Moderate';
    if (score >= 4) return 'Weak';
    return 'Very Weak';
  };
  
  return (
    <div className={`rounded-lg p-4 ${getColor(score)}`}>
      <div className="text-4xl font-bold">{score}/10</div>
      <div className="text-sm font-medium">{getLabel(score)}</div>
      <div className="text-xs opacity-80">Credibility Score</div>
    </div>
  );
};

Users can click the score to see detailed breakdown:

<Dialog>
  <DialogContent>
    <DialogTitle>Credibility Score Breakdown</DialogTitle>
    
    <div className="space-y-4">
      <ScoreComponent
        label="Source Quality"
        score={sourceScore}
        maxScore={3}
        weight={30}
        details="8 sources from 7 unique domains, including Reuters and BBC"
      />
      
      <ScoreComponent
        label="Evidence Strength"
        score={evidenceScore}
        maxScore={3}
        weight={30}
        details="Mix of peer-reviewed studies and expert testimony"
      />
      
      <ScoreComponent
        label="Fallacy Penalty"
        score={fallacyPenalty}
        maxScore={0}
        weight={30}
        details="-1.0 for 1 critical fallacy, -0.5 for 1 major fallacy"
        isNegative
      />
      
      <ScoreComponent
        label="Logical Coherence"
        score={coherenceScore}
        maxScore={1}
        weight={10}
        details="Clear structure, both sides represented"
      />
    </div>
    
    <Separator />
    
    <div className="flex items-center justify-between">
      <span className="font-bold">Final Score</span>
      <CredibilityBadge score={finalScore} />
    </div>
  </DialogContent>
</Dialog>

Comparative Scoring

Users can compare scores across multiple analyses:

const analysisList = [
  { topic: "AI Regulation", score: 7 },
  { topic: "UBI Debate", score: 6 },
  { topic: "Climate Policy", score: 8 },
];

<table>
  <thead>
    <tr>
      <th>Topic</th>
      <th>Score</th>
      <th>Quality</th>
    </tr>
  </thead>
  <tbody>
    {analysisList.map(a => (
      <tr>
        <td>{a.topic}</td>
        <td>
          <CredibilityBadge score={a.score} size="sm" />
        </td>
        <td>{getQualityLabel(a.score)}</td>
      </tr>
    ))}
  </tbody>
</table>

Limitations & Transparency

Credibility scores are imperfect heuristics, not absolute truth ratings. They reflect:

Quality of available sources (not the truth itself)
AI’s ability to detect fallacies (may have false positives/negatives)
Current state of public discourse (may miss emerging evidence)

What Scores DON’T Mean

Low Score ≠ False

A score of 3/10 means the available arguments are weak, not that the thesis is false. There might be:

Limited public discourse on the topic
Poor quality sources available
Emerging issue without research yet

High Score ≠ True

A score of 9/10 means the arguments are well-constructed, not that the thesis is proven. Scientific consensus can still evolve.

Score Comparison Isn't Direct

Comparing scores across topics has limits:

Some topics have more research than others
Controversial topics may have better sources (more coverage)
Technical topics may lack accessible sources

Future Improvements

Planned enhancements to the scoring algorithm:

Citation Quality

Weight sources based on impact factor, citations, and methodology rigor

Temporal Analysis

Track how scores change as new evidence emerges

Domain Expertise

Specialize scoring criteria for different fields (science, law, economics)

Crowdsourced Validation

Allow expert community to flag scoring issues

Next Steps

Fallacy Detection

Understand how fallacies impact scores

Source Quality

Learn about trusted source selection

Creating Analyses

Tips for generating high-credibility analyses

AI Orchestration

Technical details of score calculation

Overview

Getting Started

Core Features

User Guide

Architecture

Overview

Scoring Algorithm

Factor 1: Source Quality (30%)

Factor 2: Evidence Strength (30%)

Factor 3: Fallacy Penalty (30%)

Factor 4: Logical Coherence (10%)

Final Score Calculation

Score Interpretation

UI Presentation

Comparative Scoring

Limitations & Transparency

What Scores DON’T Mean

Future Improvements

Citation Quality

Temporal Analysis

Domain Expertise

Crowdsourced Validation

Next Steps

Fallacy Detection

Source Quality

Creating Analyses

AI Orchestration

Build docs developers (and LLMs) love

Overview

Getting Started

Core Features

User Guide

Architecture

​Overview

​Scoring Algorithm

​Factor 1: Source Quality (30%)

​Factor 2: Evidence Strength (30%)

​Factor 3: Fallacy Penalty (30%)

​Factor 4: Logical Coherence (10%)

​Final Score Calculation

​Score Interpretation

​UI Presentation

​Score Breakdown Modal

​Comparative Scoring

​Limitations & Transparency

​What Scores DON’T Mean

​Future Improvements

Citation Quality

Temporal Analysis

Domain Expertise

Crowdsourced Validation

​Next Steps

Fallacy Detection

Source Quality

Creating Analyses

AI Orchestration

Build docs developers (and LLMs) love

Overview

Scoring Algorithm

Factor 1: Source Quality (30%)

Factor 2: Evidence Strength (30%)

Factor 3: Fallacy Penalty (30%)

Factor 4: Logical Coherence (10%)

Final Score Calculation

Score Interpretation

UI Presentation

Score Breakdown Modal

Comparative Scoring

Limitations & Transparency

What Scores DON’T Mean

Future Improvements

Next Steps