Skip to main content

Overview

Every analysis generated by Argument Cartographer receives a “brutally honest” Credibility Score from 1-10. Unlike binary true/false ratings, this nuanced score reflects the complex reality of most debates.
Philosophy: Even well-argued topics rarely score above 8/10. This intentional harshness helps users understand that most real-world debates have legitimate complexity and imperfect evidence.

Scoring Algorithm

The credibility score is calculated using a weighted multi-factor algorithm:
Credibility Score = 
  Source Quality (30%) +
  Evidence Strength (30%) +
  Fallacy Penalty (30%) +
  Logical Coherence (10%)

Factor 1: Source Quality (30%)

Evaluates the diversity and reliability of sources used in the analysis.
Positive factors:
  • Number of independent sources (8+ is ideal)
  • Domain diversity (different news outlets)
  • Trusted outlet presence (Reuters, BBC, AP, etc.)
  • Recency of sources (< 30 days)
  • Geographic diversity (multiple regions)
Negative factors:
  • Echo chamber (all sources from same outlet)
  • Outdated sources (> 1 year old)
  • Unreliable domains (tabloids, conspiracy sites)
  • Insufficient sourcing (< 3 sources)

Factor 2: Evidence Strength (30%)

Assesses the quality and type of evidence backing claims.
Strongest to Weakest:
  1. Primary Research (3 points)
    • Peer-reviewed studies
    • Original data/statistics
    • Direct experiments
    • Government reports with data
  2. Expert Testimony (2 points)
    • Quotes from credentialed experts
    • Academic analysis
    • Professional organization statements
  3. Secondary Analysis (1 point)
    • Journalism synthesizing research
    • Think tank reports
    • Meta-analyses
  4. Opinion/Speculation (0 points)
    • Editorial opinions
    • Anecdotal evidence
    • Hypothetical scenarios

Factor 3: Fallacy Penalty (30%)

Deducts points based on detected logical fallacies.
Fallacy SeverityPoints DeductedMax Penalty
Critical-1.0 each-3.0 total
Major-0.5 each-2.0 total
Minor-0.2 each-1.0 total
Confidence weighting: Penalties are multiplied by confidence score
  • 90%+ confidence: Full penalty
  • 70-89%: 75% of penalty
  • 50-69%: 50% of penalty
  • < 50%: 25% of penalty

Factor 4: Logical Coherence (10%)

Evaluates internal consistency and argument structure.
Positive indicators:
  • Clear thesis-claim-evidence links
  • Both sides represented fairly
  • Claims supported by evidence (not orphaned)
  • Logical progression of arguments
Negative indicators:
  • Orphaned claims (no evidence)
  • One-sided analysis (no counterarguments)
  • Circular reasoning in structure
  • Disconnected evidence

Final Score Calculation

function calculateCredibilityScore(analysis: AnalysisResult): number {
  const sourceScore = calculateSourceQuality(analysis.sources);
  const evidenceScore = calculateEvidenceStrength(analysis.blueprint);
  const fallacyPenalty = calculateFallacyPenalty(analysis.fallacies);
  const coherenceScore = calculateLogicalCoherence(analysis.blueprint);
  
  // Weighted sum
  const rawScore = 
    (sourceScore * 0.30) +
    (evidenceScore * 0.30) +
    (fallacyPenalty * 0.30) + // Note: This is negative
    (coherenceScore * 0.10);
  
  // Normalize to 1-10 scale
  const normalized = (rawScore + 3) / 1.3; // Shift and scale
  
  // Clamp and round
  return Math.max(1, Math.min(10, Math.round(normalized)));
}
The algorithm is designed to be harsh - a score of 7-8 represents an excellent analysis. Scores of 9-10 are exceptionally rare and require near-perfect sourcing and zero fallacies.

Score Interpretation

Characteristics:
  • 8+ diverse, trusted sources
  • Strong primary evidence throughout
  • Zero or only minor fallacies
  • Perfect logical structure
  • Both sides thoroughly represented
Example: Academic meta-analysis on climate change science with peer-reviewed sources, data-backed claims, and comprehensive coverage of all viewpoints.Rarity: < 5% of analyses

UI Presentation

Credibility scores are displayed prominently:
const CredibilityBadge = ({ score }: { score: number }) => {
  const getColor = (score: number) => {
    if (score >= 8) return 'bg-emerald-500 text-white';
    if (score >= 6) return 'bg-blue-500 text-white';
    if (score >= 4) return 'bg-yellow-500 text-black';
    if (score >= 2) return 'bg-orange-500 text-white';
    return 'bg-red-500 text-white';
  };
  
  const getLabel = (score: number) => {
    if (score >= 8) return 'Strong';
    if (score >= 6) return 'Moderate';
    if (score >= 4) return 'Weak';
    return 'Very Weak';
  };
  
  return (
    <div className={`rounded-lg p-4 ${getColor(score)}`}>
      <div className="text-4xl font-bold">{score}/10</div>
      <div className="text-sm font-medium">{getLabel(score)}</div>
      <div className="text-xs opacity-80">Credibility Score</div>
    </div>
  );
};

Score Breakdown Modal

Users can click the score to see detailed breakdown:
<Dialog>
  <DialogContent>
    <DialogTitle>Credibility Score Breakdown</DialogTitle>
    
    <div className="space-y-4">
      <ScoreComponent
        label="Source Quality"
        score={sourceScore}
        maxScore={3}
        weight={30}
        details="8 sources from 7 unique domains, including Reuters and BBC"
      />
      
      <ScoreComponent
        label="Evidence Strength"
        score={evidenceScore}
        maxScore={3}
        weight={30}
        details="Mix of peer-reviewed studies and expert testimony"
      />
      
      <ScoreComponent
        label="Fallacy Penalty"
        score={fallacyPenalty}
        maxScore={0}
        weight={30}
        details="-1.0 for 1 critical fallacy, -0.5 for 1 major fallacy"
        isNegative
      />
      
      <ScoreComponent
        label="Logical Coherence"
        score={coherenceScore}
        maxScore={1}
        weight={10}
        details="Clear structure, both sides represented"
      />
    </div>
    
    <Separator />
    
    <div className="flex items-center justify-between">
      <span className="font-bold">Final Score</span>
      <CredibilityBadge score={finalScore} />
    </div>
  </DialogContent>
</Dialog>

Comparative Scoring

Users can compare scores across multiple analyses:
const analysisList = [
  { topic: "AI Regulation", score: 7 },
  { topic: "UBI Debate", score: 6 },
  { topic: "Climate Policy", score: 8 },
];

<table>
  <thead>
    <tr>
      <th>Topic</th>
      <th>Score</th>
      <th>Quality</th>
    </tr>
  </thead>
  <tbody>
    {analysisList.map(a => (
      <tr>
        <td>{a.topic}</td>
        <td>
          <CredibilityBadge score={a.score} size="sm" />
        </td>
        <td>{getQualityLabel(a.score)}</td>
      </tr>
    ))}
  </tbody>
</table>

Limitations & Transparency

Credibility scores are imperfect heuristics, not absolute truth ratings. They reflect:
  • Quality of available sources (not the truth itself)
  • AI’s ability to detect fallacies (may have false positives/negatives)
  • Current state of public discourse (may miss emerging evidence)

What Scores DON’T Mean

A score of 3/10 means the available arguments are weak, not that the thesis is false. There might be:
  • Limited public discourse on the topic
  • Poor quality sources available
  • Emerging issue without research yet
A score of 9/10 means the arguments are well-constructed, not that the thesis is proven. Scientific consensus can still evolve.
Comparing scores across topics has limits:
  • Some topics have more research than others
  • Controversial topics may have better sources (more coverage)
  • Technical topics may lack accessible sources

Future Improvements

Planned enhancements to the scoring algorithm:

Citation Quality

Weight sources based on impact factor, citations, and methodology rigor

Temporal Analysis

Track how scores change as new evidence emerges

Domain Expertise

Specialize scoring criteria for different fields (science, law, economics)

Crowdsourced Validation

Allow expert community to flag scoring issues

Next Steps

Fallacy Detection

Understand how fallacies impact scores

Source Quality

Learn about trusted source selection

Creating Analyses

Tips for generating high-credibility analyses

AI Orchestration

Technical details of score calculation

Build docs developers (and LLMs) love