Content Scorer

Overview

The ContentScorer class provides a comprehensive content quality scoring system that evaluates five dimensions: humanity/voice, specificity, structure balance, SEO compliance, and readability.

Installation

from data_sources.modules.content_scorer import ContentScorer

Scoring Dimensions

The scorer evaluates content across five weighted dimensions:

Humanity/Voice (30%): Human tone, personality, conversational devices
Specificity (25%): Concrete examples vs vague generalizations
Structure Balance (20%): Prose-to-list ratio (target 50-75%)
SEO Compliance (15%): Keyword density, meta, structure
Readability (10%): Flesch score, sentence rhythm, paragraph length

Pass Threshold: 70/100 composite score

Initialization

scorer = ContentScorer()

Methods

score

Score content across all dimensions.

result = scorer.score(
    content=article_content,
    metadata={
        'meta_title': 'How to Monetize Your Podcast in 2024',
        'meta_description': 'Learn the best podcast monetization strategies...',
        'primary_keyword': 'podcast monetization',
        'secondary_keywords': ['podcast revenue', 'podcast sponsors']
    }
)

content

str

required

Full article content (markdown)

metadata

dict

Optional metadata dict

meta_title

str

Meta title tag

meta_description

str

Meta description tag

primary_keyword

str

Primary target keyword

secondary_keywords

list

Secondary keywords

result

dict

composite_score

float

Overall weighted score (0-100)

passed

bool

True if score >= 70

threshold

int

Pass threshold (70)

dimensions

dict

Scores and details for each dimension

priority_fixes

list

Top 5 issues sorted by impact

format_report

Format scoring result as readable report.

report = scorer.format_report(result)
print(report)

Dimension Details

Humanity Score (30%)

Measures human voice and personality. Penalties:

AI phrases (-30): “in today’s digital”, “when it comes to”, “let’s dive in”, “leverage”, “utilize”, “seamless”, “unlock the power”
High passive voice (-15): Excessive use of “is/are/was/were + verb-ed”
Lack of contractions (-10): No “don’t”, “can’t”, “you’re”, “it’s”

Bonuses:

Conversational devices (+15): Parentheticals, questions, contractions, casual openers

Specificity Score (25%)

Measures concrete examples vs vague generalizations. Penalties:

Vague words (-25): “many”, “some”, “various”, “often”, “significant”, “great”, “very”, “important”
Lack of numbers/data (-15): Few percentages, dates, or counts

Bonuses:

Specificity indicators (+30): Percentages (“25%”), dollar amounts (“$1,000”), years (“2024”), dates, counts, quotes with names

Structure Balance Score (20%)

Measures prose-to-structure ratio. Target: 50-75% prose Penalties:

Too structured (less than 50% prose): Needs more narrative
Too prose-heavy (more than 75% prose): Needs more lists, tables, or visual breaks

SEO Score (15%)

Measures SEO compliance. Penalties:

Missing meta title (-15)
Missing meta description (-15)
Keyword not in H1 (-10)
Keyword not in first 100 words (-10)
Content too short (less than 2,000 words) (-15)
Meta title/description wrong length (-5)

Readability Score (10%)

Measures readability, rhythm, and paragraph length. Target: Flesch Reading Ease 60-70 (fairly easy), Grade 8-10 Penalties:

Flesch < 50 (-30): Too difficult
Grade > 12 (-10): Reading level too high
Long paragraphs (-15): Paragraphs with 5+ sentences
Monotonous rhythm (-10): Sections with uniform sentence lengths

Good rhythm: Mix short punchy (5-10 words) with longer flowing (15-25 words)

Priority Fixes

The scorer identifies the top 5 issues sorted by impact:

impact = dimension_weight * (100 - dimension_score)

Example:

{
    'dimension': 'humanity',
    'dimension_score': 65,
    'impact': 10.5,  # 0.30 * (100 - 65)
    'issue': 'AI phrases detected (12 instances)',
    'fix': 'Remove or rephrase: in today\'s digital, when it comes to, leverage',
    'severity': 'high'
}

Example Usage

from data_sources.modules.content_scorer import ContentScorer

# Read draft
with open('drafts/podcast-monetization.md', 'r') as f:
    content = f.read()

# Score content
scorer = ContentScorer()
result = scorer.score(
    content=content,
    metadata={
        'meta_title': 'How to Monetize Your Podcast in 2024',
        'meta_description': 'Learn proven podcast monetization strategies that work in 2024. From sponsorships to subscriptions, we cover everything you need to know.',
        'primary_keyword': 'podcast monetization'
    }
)

# Print formatted report
print(scorer.format_report(result))

# Check if passed
if result['passed']:
    print("✅ Content passed quality threshold")
else:
    print(f"❌ Content below threshold ({result['composite_score']}/100)")
    print("\nPriority fixes:")
    for i, fix in enumerate(result['priority_fixes'][:3], 1):
        print(f"{i}. [{fix['dimension']}] {fix['issue']}")
        print(f"   Fix: {fix['fix']}")

Example Report

==================================================
CONTENT QUALITY SCORE
==================================================

Composite Score: 73.5/100 (PASSED)
Threshold: 70

Dimensions:
  humanity              78/100 (weight: 30%) [OK]
  specificity           72/100 (weight: 25%) [OK]
  structure_balance     85/100 (weight: 20%) [OK] (68% prose)
  seo                   65/100 (weight: 15%) [NEEDS WORK]
  readability           70/100 (weight: 10%) [OK] (Flesch: 65, Rhythm: 72, Long¶: 3)

Priority Fixes:
  1. [seo] Content too short (1,850 words)
     Fix: Expand to at least 2,000 words
  2. [humanity] AI phrases detected (8 instances)
     Fix: Remove or rephrase: in today's digital, when it comes to
  3. [readability] 3 paragraphs exceed 4 sentences (longest: 6)
     Fix: Break long paragraphs into smaller chunks (2-4 sentences max)

==================================================

CLI Usage

# Score a draft file
python3 data_sources/modules/content_scorer.py drafts/podcast-monetization.md

Integration with Workflow

The scorer is used in several commands:

/write - Auto-runs after article creation
/optimize - Runs as part of optimization
/analyze-existing - Scores existing content

Source Code Reference

Location: data_sources/modules/content_scorer.py:34 See also:

Readability Scorer - Flesch Reading Ease scoring
SEO Quality Rater - Comprehensive SEO scoring

Analysis Modules

Data Modules

CRO Modules

Scoring Modules

Overview

Installation

Scoring Dimensions

Initialization

Methods

score

format_report

Dimension Details

Humanity Score (30%)

Specificity Score (25%)

Structure Balance Score (20%)

SEO Score (15%)

Readability Score (10%)

Priority Fixes

Example Usage

Example Report

CLI Usage

Integration with Workflow

Source Code Reference

Build docs developers (and LLMs) love

Analysis Modules

Data Modules

CRO Modules

Scoring Modules

​Overview

​Installation

​Scoring Dimensions

​Initialization

​Methods

​score

​format_report

​Dimension Details

​Humanity Score (30%)

​Specificity Score (25%)

​Structure Balance Score (20%)

​SEO Score (15%)

​Readability Score (10%)

​Priority Fixes

​Example Usage

​Example Report

​CLI Usage

​Integration with Workflow

​Source Code Reference

Build docs developers (and LLMs) love

Overview

Installation

Scoring Dimensions

Initialization

Methods

score

format_report

Dimension Details

Humanity Score (30%)

Specificity Score (25%)

Structure Balance Score (20%)

SEO Score (15%)

Readability Score (10%)

Priority Fixes

Example Usage

Example Report

CLI Usage

Integration with Workflow

Source Code Reference