Content Length Comparator

The Content Length Comparator fetches top SERP results for a keyword and analyzes their content length to determine optimal word count for competitive ranking. It provides statistical analysis and recommendations based on competitor benchmarks.

Basic Usage

Use the convenience function for quick comparison:

from data_sources.modules.content_length_comparator import compare_content_length

# Requires SERP data from DataForSEO
serp_results = [
    {'url': 'https://example1.com/podcast', 'domain': 'example1.com', 'title': 'How to Start a Podcast'},
    {'url': 'https://example2.com/guide', 'domain': 'example2.com', 'title': 'Podcast Guide'}
]

result = compare_content_length(
    keyword="how to start a podcast",
    your_word_count=2200,
    serp_results=serp_results,
    fetch_content=True
)

print(f"Your count: {result['your_word_count']}")
print(f"Median: {result['statistics']['median']}")
print(f"Recommended: {result['recommendation']['recommended_optimal']}")
print(f"Status: {result['recommendation']['your_status']}")

Class API

ContentLengthComparator

The main comparator class:

from data_sources.modules.content_length_comparator import ContentLengthComparator

comparator = ContentLengthComparator()
result = comparator.analyze(
    keyword="your keyword",
    your_word_count=2500,
    serp_results=serp_data,
    fetch_content=True
)

analyze()

Analyze content length compared to SERP competitors.

keyword

string

required

Search keyword to analyze

your_word_count

int

Your content’s word count for comparison

serp_results

list[dict]

required

SERP results from DataForSEO. Each dict should contain:

url (string): Page URL to fetch
domain (string): Domain name
title (string): Page title

fetch_content

boolean

Whether to fetch and analyze competitor content. Default: True

keyword

string

The analyzed keyword

competitors_analyzed

int

Number of competitor pages successfully analyzed

your_word_count

int

Your content’s word count

statistics

object

Statistical measures of competitor word counts:

min (int): Shortest competitor content
max (int): Longest competitor content
mean (int): Average word count
median (int): Median word count
mode (int): Most common word count
std_dev (int): Standard deviation
percentile_25 (int): 25th percentile
percentile_75 (int): 75th percentile

competitor_lengths

list[object]

Individual competitor data:

position (int): SERP position (1-10)
url (string): Page URL
domain (string): Domain name
title (string): Page title
word_count (int): Word count

your_position

string

Where your content falls in the competitor range

recommendation

object

Length recommendations:

recommended_min (int): Minimum recommended word count
recommended_optimal (int): Optimal target word count
recommended_max (int): Maximum recommended word count
your_status (string): Status - too_short, short, good, optimal, long
message (string): Actionable recommendation message
reasoning (string): Explanation of recommendation

competitive_analysis

object

Competitive comparison:

total_competitors (int): Total competitors analyzed
length_distribution (dict): Distribution by word count ranges
comparison (dict): How you compare (if your_word_count provided)
gap_to_median (dict): Gap to median (if below median)
gap_to_75th_percentile (dict): Gap to 75th percentile

Statistics Analysis

Understand competitor content length distribution:

stats = result['statistics']

print(f"Range: {stats['min']} - {stats['max']} words")
print(f"Average: {stats['mean']} words")
print(f"Median: {stats['median']} words")
print(f"75th percentile: {stats['percentile_75']} words")
print(f"Standard deviation: {stats['std_dev']} words")

Example output:

{
    'min': 1800,
    'max': 4500,
    'mean': 2650,
    'median': 2500,
    'mode': 2400,
    'std_dev': 680,
    'percentile_25': 2100,
    'percentile_75': 3200
}

Competitor Analysis

View individual competitor word counts:

for comp in result['competitor_lengths']:
    print(f"Position {comp['position']}: {comp['domain']}")
    print(f"  Title: {comp['title']}")
    print(f"  Word Count: {comp['word_count']}")

Example output:

Position 1: example.com
  Title: How to Start a Podcast: The Complete Guide
  Word Count: 3200
  
Position 2: competitor.com
  Title: Podcast Starter Guide 2024
  Word Count: 2800
  
Position 3: blog.site.com
  Title: Starting Your First Podcast
  Word Count: 2400

Recommendations

Get actionable length recommendations:

rec = result['recommendation']

print(f"Recommended Range: {rec['recommended_min']} - {rec['recommended_max']} words")
print(f"Optimal Target: {rec['recommended_optimal']} words")
print(f"Your Status: {rec['your_status']}")
print(f"Message: {rec['message']}")
print(f"Reasoning: {rec['reasoning']}")

Recommendation Logic

The recommended optimal target is calculated as:

Greater of: 75th percentile OR median + 20%
This ensures you match or exceed top performers

Example:

{
    'recommended_min': 2500,
    'recommended_optimal': 3200,
    'recommended_max': 3840,
    'your_status': 'short',
    'message': 'Your content is shorter than most competitors. Consider adding 1000 more words.',
    'reasoning': 'Based on median (2500) and 75th percentile (3200) of top 10 results'
}

Status Values

too_short: < 80% of recommended minimum - significantly short
short: < recommended minimum - below most competitors
good: < recommended optimal - competitive but room to improve
optimal: Within recommended range - matches top performers ✅
long: > recommended maximum - longer than necessary

Length Distribution

See how competitor content is distributed across length ranges:

dist = result['competitive_analysis']['length_distribution']

print("Content Length Distribution:")
print(f"  Under 1000 words: {dist['under_1000']}")
print(f"  1000-1500 words: {dist['1000_1500']}")
print(f"  1500-2000 words: {dist['1500_2000']}")
print(f"  2000-2500 words: {dist['2000_2500']}")
print(f"  2500-3000 words: {dist['2500_3000']}")
print(f"  3000+ words: {dist['3000_plus']}")

Example output:

{
    'under_1000': 0,
    '1000_1500': 1,
    '1500_2000': 2,
    '2000_2500': 3,
    '2500_3000': 2,
    '3000_plus': 2
}

Your Position

See where your content falls in the competitor range:

print(f"Your position: {result['your_position']}")

Example outputs:

"Below all competitors (shortest is 1800)"
"Between position 3 and 4 competitors"
"Above all competitors (longest is 3500)"
"Within competitive range"

Comparison Analysis

Get detailed comparison when your word count is provided:

if 'comparison' in result['competitive_analysis']:
    comp = result['competitive_analysis']['comparison']
    
    print(f"Shorter than you: {comp['shorter_than_you']} competitors")
    print(f"Longer than you: {comp['longer_than_you']} competitors")
    print(f"Your percentile: {comp['percentile']}th")

Example:

{
    'shorter_than_you': 3,
    'longer_than_you': 7,
    'percentile': 30
}

Gap Analysis

See how many words you’re behind competitors:

if 'gap_to_median' in result['competitive_analysis']:
    gap = result['competitive_analysis']['gap_to_median']
    print(f"Gap to median: {gap['words']} words ({gap['percentage']}%)")

if 'gap_to_75th_percentile' in result['competitive_analysis']:
    gap = result['competitive_analysis']['gap_to_75th_percentile']
    print(f"Gap to 75th percentile: {gap['words']} words ({gap['percentage']}%)")

Example:

{
    'gap_to_median': {
        'words': 300,
        'percentage': 14
    },
    'gap_to_75th_percentile': {
        'words': 1000,
        'percentage': 45
    }
}

Integration with DataForSEO

Combine with DataForSEO for real SERP analysis:

from data_sources.modules.dataforseo import DataForSEO
from data_sources.modules.content_length_comparator import compare_content_length

# Step 1: Get SERP data
dfs = DataForSEO()
serp_data = dfs.get_serp_data("how to start a podcast")

# Step 2: Extract organic results
organic_results = [
    {
        'url': result['url'],
        'domain': result['domain'],
        'title': result['title']
    }
    for result in serp_data['organic'][:10]
]

# Step 3: Compare content length
result = compare_content_length(
    keyword="how to start a podcast",
    your_word_count=2200,
    serp_results=organic_results,
    fetch_content=True
)

print(f"Recommended optimal: {result['recommendation']['recommended_optimal']} words")
print(f"Your status: {result['recommendation']['your_status']}")

Content Extraction

The comparator intelligently extracts main content:

Removes noise: Strips script, style, nav, footer, header, aside elements
Finds main content: Looks for article, main, .content, .post, .entry-content
Counts real words: Only counts words with 2+ characters
Handles failures: Silently skips pages that fail to fetch

# The analyzer automatically handles content extraction
result = comparator.analyze(
    keyword="your keyword",
    serp_results=serp_results,
    fetch_content=True  # Automatically fetches and extracts content
)

Real-World Example

Complete workflow:

from data_sources.modules.content_length_comparator import compare_content_length

# SERP data from DataForSEO
serp_results = [
    {'url': 'https://example1.com/podcast-guide', 'domain': 'example1.com', 'title': 'Ultimate Podcast Guide'},
    {'url': 'https://example2.com/start-podcast', 'domain': 'example2.com', 'title': 'How to Start a Podcast'},
    {'url': 'https://example3.com/podcast-101', 'domain': 'example3.com', 'title': 'Podcasting 101'},
    # ... more results
]

# Your article is 2200 words
result = compare_content_length(
    keyword="how to start a podcast",
    your_word_count=2200,
    serp_results=serp_results
)

# Print report
print("=== Content Length Analysis ===")
print(f"\nKeyword: {result['keyword']}")
print(f"Competitors analyzed: {result['competitors_analyzed']}")
print(f"Your word count: {result['your_word_count']}")

print(f"\nCompetitor Statistics:")
stats = result['statistics']
print(f"  Range: {stats['min']} - {stats['max']} words")
print(f"  Average: {stats['mean']} words")
print(f"  Median: {stats['median']} words")
print(f"  75th percentile: {stats['percentile_75']} words")

print(f"\nRecommendation:")
rec = result['recommendation']
print(f"  Target: {rec['recommended_optimal']} words")
print(f"  Range: {rec['recommended_min']}-{rec['recommended_max']} words")
print(f"  Your status: {rec['your_status']}")
print(f"  {rec['message']}")

print(f"\nYour Position: {result['your_position']}")

if 'gap_to_75th_percentile' in result['competitive_analysis']:
    gap = result['competitive_analysis']['gap_to_75th_percentile']
    print(f"\nGap to top performers: {gap['words']} words ({gap['percentage']}% increase needed)")

Error Handling

The module handles errors gracefully:

result = compare_content_length(
    keyword="test",
    serp_results=[]
)

if 'error' in result:
    print(f"Error: {result['error']}")
    print(f"Recommendation: {result['recommendation']}")

Error cases:

No SERP results provided
Could not fetch competitor content
Insufficient sections for clustering

Best Practices

Target 75th percentile - aim to match or exceed top performers
Analyze top 10 results - provides comprehensive competitor data
Use with DataForSEO - get real-time SERP data
Check regularly - competitor content lengths change over time
Consider intent - transactional queries may need less content
Quality over quantity - don’t add fluff just to hit word count
Match or exceed median - minimum competitive threshold

SEO Quality Rater - Includes word count in quality score
Search Intent Analyzer - Different intents need different lengths
Keyword Analyzer - Longer content affects keyword density

Analysis Modules

Data Modules

CRO Modules

Scoring Modules

Content Length Comparator

Basic Usage

Class API

ContentLengthComparator

analyze()

Statistics Analysis

Competitor Analysis

Recommendations

Recommendation Logic

Status Values

Length Distribution

Your Position

Comparison Analysis

Gap Analysis

Integration with DataForSEO

Content Extraction

Real-World Example

Error Handling

Best Practices

Build docs developers (and LLMs) love

Analysis Modules

Data Modules

CRO Modules

Scoring Modules

​Basic Usage

​Class API

​ContentLengthComparator

​analyze()

​Statistics Analysis

​Competitor Analysis

​Recommendations

​Recommendation Logic

​Status Values

​Length Distribution

​Your Position

​Comparison Analysis

​Gap Analysis

​Integration with DataForSEO

​Content Extraction

​Real-World Example

​Error Handling

​Best Practices

​Related Modules

Build docs developers (and LLMs) love

Basic Usage

Class API

ContentLengthComparator

analyze()

Statistics Analysis

Competitor Analysis

Recommendations

Recommendation Logic

Status Values

Length Distribution

Your Position

Comparison Analysis

Gap Analysis

Integration with DataForSEO

Content Extraction

Real-World Example

Error Handling

Best Practices

Related Modules