Skip to main content

Overview

The DataAggregator class combines data from Google Analytics 4, Google Search Console, and DataForSEO into unified reports for comprehensive content analysis and opportunity identification.

Installation

from data_sources.modules.data_aggregator import DataAggregator

Authentication

The aggregator automatically initializes all data source clients using their respective environment variables. See individual module documentation for authentication setup:

Initialization

aggregator = DataAggregator()
The aggregator will attempt to initialize all three data sources. If any fail (missing credentials, etc.), it will print a warning and continue with available sources.

Methods

get_comprehensive_page_performance

Get all available data for a specific page from all sources.
performance = aggregator.get_comprehensive_page_performance(
    url="/blog/podcast-monetization",
    days=30
)
url
str
required
Page path or full URL
days
int
default:"30"
Days to analyze
performance
dict
url
str
Page URL
analyzed_at
str
ISO timestamp of analysis
period_days
int
Analysis period in days
ga4
dict
Google Analytics data (pageviews, trends)
gsc
dict
Search Console data (clicks, impressions, keywords)
dataforseo
dict
DataForSEO rankings for top keywords

identify_content_opportunities

Identify content opportunities across all data sources.
opportunities = aggregator.identify_content_opportunities(
    days=30,
    min_monthly_pageviews=100
)
days
int
default:"30"
Number of days to analyze
min_monthly_pageviews
int
default:"100"
Minimum monthly pageviews threshold
opportunities
dict
quick_wins
list
Keywords ranking 11-20 (from GSC)
declining_content
list
Pages losing traffic (from GA4)
low_ctr
list
High impressions, low CTR pages (from GSC)
Rising queries (from GSC)
competitor_gaps
list
Keywords competitors rank for (from DataForSEO)

generate_performance_report

Generate comprehensive performance report with summary metrics, top performers, opportunities, and recommendations.
report = aggregator.generate_performance_report(days=30)
days
int
default:"30"
Days to analyze
report
dict
generated_at
str
ISO timestamp
period_days
int
Analysis period
summary
dict
Summary metrics from all sources
top_performers
list
Top 10 pages by pageviews
opportunities
dict
Content opportunities by category
recommendations
list
Actionable recommendations

get_priority_queue

Get prioritized list of content tasks.
tasks = aggregator.get_priority_queue(limit=10)
limit
int
default:"10"
Number of tasks to return
tasks
list
Prioritized task list sorted by priority (high → medium → low)
priority
str
“high”, “medium”, or “low”
type
str
Task type (optimize, update, optimize_meta, create_new)
action
str
Description of action
reason
str
Why this task is recommended

Recommendation Types

The aggregator generates actionable recommendations based on identified opportunities:

Quick Wins

{
    'priority': 'high',
    'type': 'optimize',
    'action': "Optimize for 'podcast hosting'",
    'reason': "Currently ranking #12 with 1,500 impressions. Small improvements could push to page 1.",
    'keyword': 'podcast hosting',
    'current_position': 12
}

Declining Content

{
    'priority': 'high',
    'type': 'update',
    'action': 'Update declining article: How to Monetize Your Podcast',
    'reason': 'Traffic down 35% (1,200 → 780 pageviews). Needs refresh.',
    'url': '/blog/podcast-monetization',
    'change_percent': -35.0
}

Low CTR

{
    'priority': 'medium',
    'type': 'optimize_meta',
    'action': 'Improve meta elements for: /blog/podcast-analytics',
    'reason': 'Getting 2,000 impressions but only 2.1% CTR. Better title/description could add 50 clicks/month.',
    'url': '/blog/podcast-analytics',
    'potential_clicks': 50
}
{
    'priority': 'medium',
    'type': 'create_new',
    'action': "Create content for trending topic: 'video podcast software'",
    'reason': 'Search interest up 45% with 800 recent impressions. Strike while hot!',
    'query': 'video podcast software',
    'growth': 45.0
}

Example Usage

from data_sources.modules.data_aggregator import DataAggregator

aggregator = DataAggregator()

# Generate full report
report = aggregator.generate_performance_report(days=30)

print(f"Report Period: Last {report['period_days']} days")
print(f"Generated: {report['generated_at']}")

# Summary
if report['summary']:
    print("\n📊 SUMMARY")
    print("-" * 80)
    if 'total_pageviews' in report['summary']:
        print(f"Total Pageviews: {report['summary']['total_pageviews']:,}")
        print(f"Total Sessions: {report['summary']['total_sessions']:,}")
        print(f"Avg Engagement Rate: {report['summary']['avg_engagement_rate']:.1%}")
    if 'total_clicks' in report['summary']:
        print(f"Total Clicks (GSC): {report['summary']['total_clicks']:,}")
        print(f"Total Impressions: {report['summary']['total_impressions']:,}")
        print(f"Avg CTR: {report['summary']['avg_ctr']:.2%}")

# Top performers
if report.get('top_performers'):
    print("\n🏆 TOP 10 PERFORMERS")
    print("-" * 80)
    for i, page in enumerate(report['top_performers'][:10], 1):
        print(f"{i}. {page['title']}")
        print(f"   {page['pageviews']:,} views | {page['engagement_rate']:.1%} engagement")

# Recommendations
if report.get('recommendations'):
    print("\n✅ TOP RECOMMENDATIONS")
    print("-" * 80)
    for i, rec in enumerate(report['recommendations'][:5], 1):
        print(f"\n{i}. [{rec['priority'].upper()}] {rec['action']}")
        print(f"   {rec['reason']}")

# Priority queue for task management
print("\n📋 PRIORITY QUEUE")
print("-" * 80)
tasks = aggregator.get_priority_queue(limit=10)
for i, task in enumerate(tasks, 1):
    print(f"\n{i}. [{task['priority'].upper()}] {task['type']}")
    print(f"   {task['action']}")
    print(f"   {task['reason']}")

Source Code Reference

Location: data_sources/modules/data_aggregator.py:24 The aggregator automatically handles errors from individual data sources gracefully, continuing with available data if some sources fail.

Build docs developers (and LLMs) love