Keyword Analyzer

The Keyword Analyzer calculates keyword density, analyzes distribution across content sections, performs semantic clustering, and detects keyword stuffing risks. It provides comprehensive keyword usage analysis for SEO optimization.

Basic Usage

Use the convenience function for quick analysis:

from data_sources.modules.keyword_analyzer import analyze_keywords

result = analyze_keywords(
    content=article_content,
    primary_keyword="start a podcast",
    secondary_keywords=["podcast hosting", "podcast equipment"],
    target_density=1.5
)

print(f"Density: {result['primary_keyword']['density']}%")
print(f"Status: {result['primary_keyword']['density_status']}")
print(f"Stuffing Risk: {result['keyword_stuffing']['risk_level']}")

Class API

KeywordAnalyzer

The main analyzer class:

from data_sources.modules.keyword_analyzer import KeywordAnalyzer

analyzer = KeywordAnalyzer()
result = analyzer.analyze(
    content=article_content,
    primary_keyword="start a podcast",
    secondary_keywords=["podcast hosting", "podcast equipment"],
    target_density=1.5
)

analyze()

Perform comprehensive keyword analysis.

content

string

required

Article content to analyze (full text including headers)

primary_keyword

string

required

Main target keyword or keyphrase

secondary_keywords

list[string]

List of secondary keywords to analyze. Default: []

target_density

float

Target keyword density percentage. Default: 1.5 (1.5%)

word_count

int

Total word count of content

primary_keyword

object

Primary keyword analysis:

keyword (string): The analyzed keyword
exact_matches (int): Number of exact keyword matches
total_occurrences (int): Total occurrences including variations
density (float): Keyword density percentage
target_density (float): Target density
density_status (string): Status - too_low, slightly_low, optimal, slightly_high, too_high
positions (list[int]): Character positions where keyword appears
critical_placements (dict): Keyword presence in critical locations
section_distribution (list[dict]): Distribution across content sections

secondary_keywords

list[object]

Array of secondary keyword analyses with same structure as primary_keyword

keyword_stuffing

object

Keyword stuffing detection:

risk_level (string): none, low, medium, high
warnings (list[string]): Specific stuffing warnings
safe (boolean): True if risk is none or low

topic_clusters

object

Topic clustering analysis using TF-IDF and k-means:

clusters_found (int): Number of topic clusters identified
clusters (list[dict]): Cluster details with top terms

distribution_heatmap

list[object]

Visual heatmap of keyword distribution:

section (string): Section header
keyword_count (int): Keyword count in section
heat_level (int): Heat level 0-5
density (float): Section keyword density

lsi_keywords

list[string]

LSI (Latent Semantic Indexing) keywords - semantically related terms found in content

recommendations

list[string]

Actionable recommendations for keyword optimization

Keyword Density Analysis

The analyzer calculates both exact matches and variations:

result = analyze_keywords(
    content=article_content,
    primary_keyword="podcast hosting"
)

print(f"Exact matches: {result['primary_keyword']['exact_matches']}")
print(f"Total occurrences: {result['primary_keyword']['total_occurrences']}")
print(f"Density: {result['primary_keyword']['density']}%")
print(f"Target: {result['primary_keyword']['target_density']}%")
print(f"Status: {result['primary_keyword']['density_status']}")

Density Status Values

too_low: < 50% of target density
slightly_low: 50-80% of target density
optimal: 80-120% of target density ✅
slightly_high: 120-150% of target density
too_high: > 150% of target density ⚠️

Critical Placements

Check if keywords appear in strategic locations:

placements = result['primary_keyword']['critical_placements']

print(f"In first 100 words: {placements['in_first_100_words']}")
print(f"In H1: {placements['in_h1']}")
print(f"In H2 headings: {placements['in_h2_headings']}")
print(f"In conclusion: {placements['in_conclusion']}")
print(f"H2 keyword ratio: {placements['h2_keyword_ratio']}")

Output:

{
    'in_first_100_words': True,
    'in_h1': True,
    'in_h2_headings': '2/5',
    'in_conclusion': True,
    'h2_keyword_ratio': 0.4
}

Section Distribution

Analyze how keywords are distributed across content sections:

for section in result['primary_keyword']['section_distribution']:
    print(f"Section: {section['header']}")
    print(f"  Keyword count: {section['keyword_count']}")
    print(f"  Section density: {section['density']}%")
    print(f"  Word count: {section['word_count']}")

Keyword Stuffing Detection

Detect potential keyword stuffing issues:

stuffing = result['keyword_stuffing']

print(f"Risk level: {stuffing['risk_level']}")
print(f"Safe: {stuffing['safe']}")

if not stuffing['safe']:
    print("Warnings:")
    for warning in stuffing['warnings']:
        print(f"  - {warning}")

Stuffing Detection Criteria

High Density: > 3% triggers high risk, > 2.5% triggers medium risk
Paragraph Clustering: Paragraphs with > 5% density
Consecutive Sentences: Keyword in 5+ consecutive sentences (high risk) or 3+ (low risk)

Example output:

{
    'risk_level': 'medium',
    'warnings': [
        'Keyword density 2.8% is high (over 2.5%)',
        'Paragraph 3 has very high keyword density (6.2%)'
    ],
    'safe': False
}

Topic Clustering

Identify content themes using TF-IDF and k-means clustering:

clusters = result['topic_clusters']

print(f"Clusters found: {clusters['clusters_found']}")

for cluster in clusters['clusters']:
    print(f"\nCluster {cluster['cluster_id']}:")
    print(f"  Top terms: {', '.join(cluster['top_terms'])}")
    print(f"  Sections: {cluster['section_count']}")

Example output:

{
    'clusters_found': 3,
    'clusters': [
        {
            'cluster_id': 0,
            'top_terms': ['podcast', 'hosting', 'platform', 'audio', 'upload'],
            'section_count': 4,
            'sections': [0, 2, 5, 8]
        },
        {
            'cluster_id': 1,
            'top_terms': ['equipment', 'microphone', 'recording', 'audio quality'],
            'section_count': 2,
            'sections': [3, 6]
        }
    ]
}

Distribution Heatmap

Visualize keyword distribution across sections:

for section in result['distribution_heatmap']:
    heat_bar = '█' * section['heat_level']
    print(f"{section['section']:30} {heat_bar:10} ({section['keyword_count']} / {section['density']}%)")

Output:

Introduction                   ███        (3 / 2.1%)
What is Podcast Hosting?       ████       (5 / 2.8%)
Choosing a Platform            ██         (2 / 1.2%)
Pricing and Features           ███        (4 / 2.3%)

Heat Level Scale

0: No keyword mentions
1: < 0.5% density
2: 0.5-1.0% density
3: 1.0-2.0% density
4: 2.0-3.0% density
5: > 3.0% density

LSI Keywords

Discover semantically related terms already in your content:

print("LSI Keywords found:")
for keyword in result['lsi_keywords'][:10]:
    print(f"  - {keyword}")

Example output:

[
    'hosting platform',
    'audio quality',
    'podcast episodes',
    'recording software',
    'distribute podcast',
    'podcast directories',
    'monthly listeners',
    'podcast analytics'
]

Secondary Keywords

Analyze multiple secondary keywords:

for secondary in result['secondary_keywords']:
    print(f"\nKeyword: {secondary['keyword']}")
    print(f"  Occurrences: {secondary['total_occurrences']}")
    print(f"  Density: {secondary['density']}%")
    print(f"  Status: {secondary['density_status']}")

Secondary keywords have lower target density (50% of primary target by default).

Recommendations

Get actionable recommendations based on analysis:

print("Recommendations:")
for rec in result['recommendations']:
    print(f"  {rec}")

Example output:

Recommendations:
  ⚠️ Primary keyword density is too low (0.8%). Target is 1.5%. Add 'start a podcast' naturally in more paragraphs.
  ⚠️ Primary keyword missing from H1 headline - include it in the title
  ℹ️ Primary keyword appears in only 1/6 H2 headings. Aim for 2-3 H2s with keyword variations.
  ℹ️ Secondary keyword 'podcast equipment' not found in content - consider adding it

Real-World Example

Complete analysis workflow:

from data_sources.modules.keyword_analyzer import analyze_keywords

# Your article content
article = """
# How to Start a Podcast: Complete Guide

Starting a podcast has never been easier. In this guide, you'll learn how to start a podcast from scratch.

## Choosing Your Podcast Topic

When you start a podcast, the first step is choosing your topic. Your podcast topic should be something you're passionate about.

## Getting Podcast Equipment

To start a podcast, you need basic equipment. A good microphone is essential for podcast recording.

## Podcast Hosting Platforms

Podcast hosting is crucial. Choose a reliable podcast hosting platform for your show.
"""

# Analyze keywords
result = analyze_keywords(
    content=article,
    primary_keyword="start a podcast",
    secondary_keywords=["podcast hosting", "podcast equipment", "podcast recording"],
    target_density=1.5
)

# Check results
print(f"Word Count: {result['word_count']}")
print(f"\nPrimary Keyword: {result['primary_keyword']['keyword']}")
print(f"Density: {result['primary_keyword']['density']}% ({result['primary_keyword']['density_status']})")
print(f"Exact matches: {result['primary_keyword']['exact_matches']}")

print(f"\nCritical Placements:")
for key, value in result['primary_keyword']['critical_placements'].items():
    print(f"  {key}: {value}")

print(f"\nKeyword Stuffing Risk: {result['keyword_stuffing']['risk_level']}")

if result['keyword_stuffing']['warnings']:
    print("Warnings:")
    for warning in result['keyword_stuffing']['warnings']:
        print(f"  - {warning}")

print(f"\nRecommendations:")
for rec in result['recommendations']:
    print(f"  {rec}")

Best Practices

Target 1.5% density for primary keywords (optimal range: 1.2-1.8%)
Include keyword in H1 - critical for SEO
Add keyword to first 100 words - establishes topic immediately
Use in 2-3 H2 headings - but vary the phrasing
Avoid stuffing - keep density under 2.5%
Analyze secondary keywords - ensure comprehensive coverage
Check LSI keywords - use related terms naturally
Monitor distribution - avoid clustering in one section

Search Intent Analyzer - Determine intent before keyword optimization
SEO Quality Rater - Uses keyword density in overall SEO score
Content Length Comparator - Affects keyword count calculations

Analysis Modules

Data Modules

CRO Modules

Scoring Modules

Basic Usage

Class API

KeywordAnalyzer

analyze()

Keyword Density Analysis

Density Status Values

Critical Placements

Section Distribution

Keyword Stuffing Detection

Stuffing Detection Criteria

Topic Clustering

Distribution Heatmap

Heat Level Scale

LSI Keywords

Secondary Keywords

Recommendations

Real-World Example

Best Practices

Build docs developers (and LLMs) love

Analysis Modules

Data Modules

CRO Modules

Scoring Modules

​Basic Usage

​Class API

​KeywordAnalyzer

​analyze()

​Keyword Density Analysis

​Density Status Values

​Critical Placements

​Section Distribution

​Keyword Stuffing Detection

​Stuffing Detection Criteria

​Topic Clustering

​Distribution Heatmap

​Heat Level Scale

​LSI Keywords

​Secondary Keywords

​Recommendations

​Real-World Example

​Best Practices

​Related Modules

Build docs developers (and LLMs) love

Basic Usage

Class API

KeywordAnalyzer

analyze()

Keyword Density Analysis

Density Status Values

Critical Placements

Section Distribution

Keyword Stuffing Detection

Stuffing Detection Criteria

Topic Clustering

Distribution Heatmap

Heat Level Scale

LSI Keywords

Secondary Keywords

Recommendations

Real-World Example

Best Practices

Related Modules