Retention Cohort Analysis

Overview

Retention analysis shows how many visitors return to your site over time. Sparklytics tracks retention cohorts automatically — no additional setup required beyond standard pageview tracking.

Retention heatmap showing weekly cohorts

How Retention Works

Sparklytics groups visitors into cohorts based on their first visit date, then tracks how many return in subsequent periods.

Example: Weekly Cohorts

Cohort Start	Week 0	Week 1	Week 2	Week 3
Feb 3	100%	32%	24%	18%
Feb 10	100%	28%	22%	—
Feb 17	100%	35%	—	—
Feb 24	100%	—	—	—

Week 0: Always 100% (all new visitors)
Week 1: Percentage who returned 1 week later
Week 2: Percentage who returned 2 weeks later

Granularity Options

Day

Track daily retention for the first 30 days:

GET /api/websites/{website_id}/retention?
  cohort_granularity=day&
  max_periods=30&
  start_date=2026-02-01&
  end_date=2026-03-01

Best for:

Short-term engagement analysis
Apps with daily active users
Quick A/B test validation

Week

Track weekly retention for up to 12 weeks:

GET /api/websites/{website_id}/retention?
  cohort_granularity=week&
  max_periods=8

Default granularity if not specified. Best for:

SaaS products
Content sites with weekly publishing
Medium-term engagement trends

Month

Track monthly retention for up to 12 months:

GET /api/websites/{website_id}/retention?
  cohort_granularity=month&
  max_periods=12

Best for:

Long-term user behavior
Annual subscription products
Low-frequency usage patterns

API Reference

Endpoint

GET /api/websites/{website_id}/retention

Query Parameters

start_date

string

required

Start date in YYYY-MM-DD format (required)

end_date

string

required

End date in YYYY-MM-DD format (required)

cohort_granularity

string

default:"week"

Cohort period: day, week, or month

max_periods

number

Number of periods to track. Defaults:

Day: 30 (max 30)
Week: 8 (max 12)
Month: 12 (max 12)

timezone

string

default:"UTC"

IANA timezone for cohort boundaries (e.g., America/New_York)

Response

{
  "data": {
    "cohorts": [
      {
        "cohort_label": "2026-02-03",
        "cohort_size": 1250,
        "periods": [
          { "period": 0, "visitors": 1250, "rate": 100.0 },
          { "period": 1, "visitors": 398, "rate": 31.84 },
          { "period": 2, "visitors": 301, "rate": 24.08 },
          { "period": 3, "visitors": 225, "rate": 18.0 }
        ]
      },
      {
        "cohort_label": "2026-02-10",
        "cohort_size": 1100,
        "periods": [
          { "period": 0, "visitors": 1100, "rate": 100.0 },
          { "period": 1, "visitors": 308, "rate": 28.0 },
          { "period": 2, "visitors": 242, "rate": 22.0 }
        ]
      }
    ],
    "granularity": "week",
    "max_periods": 8
  }
}

Response Fields

cohort_label

string

ISO 8601 date string representing the cohort start

cohort_size

number

Total unique visitors who first visited during this cohort period

period

number

Zero-indexed period offset (0 = first visit period)

visitors

number

Number of cohort members who returned in this period

rate

number

Retention rate as a percentage (visitors / cohort_size × 100)

Implementation Details

Granularity Parsing

From crates/sparklytics-server/src/routes/retention.rs:47:

fn parse_granularity(raw: Option<&str>) -> Result<RetentionGranularity, AppError> {
    match raw.map(str::trim) {
        None => Ok(RetentionGranularity::Week),
        Some("day") => Ok(RetentionGranularity::Day),
        Some("week") => Ok(RetentionGranularity::Week),
        Some("month") => Ok(RetentionGranularity::Month),
        Some(_) => Err(AppError::BadRequest(
            "cohort_granularity must be one of: day, week, month".to_string(),
        )),
    }
}

Period Limits

From crates/sparklytics-server/src/routes/retention.rs:67:

fn validate_max_periods(
    granularity: &RetentionGranularity,
    max_periods: u32,
) -> Result<(), AppError> {
    let (min, max, label) = match granularity {
        RetentionGranularity::Day => (1, 30, "daily"),
        RetentionGranularity::Week => (1, 12, "weekly"),
        RetentionGranularity::Month => (1, 12, "monthly"),
    };

    if (min..=max).contains(&max_periods) {
        Ok(())
    } else {
        Err(AppError::BadRequest(format!(
            "max_periods must be between {min} and {max} for {label} granularity"
        )))
    }
}

Concurrency Control

Retention queries are computationally expensive, so they’re rate-limited with a semaphore:

const RETENTION_QUEUE_WAIT_TIMEOUT: Duration = Duration::from_secs(5);

let _permit = tokio::time::timeout(
    RETENTION_QUEUE_WAIT_TIMEOUT,
    state.retention_semaphore.acquire(),
)
.await
.map_err(|_| AppError::RateLimited)?;

If the queue is full, requests return 429 Too Many Requests.

Filtering Retention Data

Apply standard filters to analyze retention for specific segments:

GET /api/websites/{website_id}/retention?
  cohort_granularity=week&
  filter_country=US&
  filter_utm_source=google

Available Filters

All standard filters work with retention:

filter_country
filter_region
filter_city
filter_browser
filter_os
filter_device
filter_page — Entry page filter
filter_referrer
filter_utm_source
filter_utm_medium
filter_utm_campaign

Dashboard Visualization

The retention heatmap component provides an interactive view:

Color Scale

From dashboard/components/retention/RetentionHeatmap.tsx:

0-10%: Light gray (very low retention)
10-25%: Yellow to orange (low retention)
25-50%: Orange to light green (moderate retention)
50-75%: Green (good retention)
75-100%: Dark green (excellent retention)

Hover over any cell to see:

Cohort: Feb 3, 2026
Week 2
301 visitors (24.08%)

Performance Considerations

DuckDB (Self-Hosted)

Retention queries on DuckDB can be slow for large datasets:

< 1M events: Sub-second queries
1M-10M events: 1-5 seconds
> 10M events: Consider upgrading to ClickHouse

ClickHouse (Cloud)

ClickHouse provides near-constant query time:

< 1M events: < 100ms
100M+ events: < 500ms
1B+ events: < 2 seconds

From the README benchmarks:

ClickHouse vs DuckDB speedup: 10–68x at 100k, 47–239x at 1M

Use Cases

Product-Market Fit

Measure whether users come back after their first visit:

# Track weekly retention for new users
GET .../retention?cohort_granularity=week&max_periods=12

Good retention (product-market fit indicators):

Week 1: > 30%
Week 4: > 20%
Week 8: > 15%

Feature Launch Impact

Compare retention before and after a feature launch:

# Pre-launch cohorts
GET .../retention?start_date=2026-01-01&end_date=2026-01-31

# Post-launch cohorts
GET .../retention?start_date=2026-02-01&end_date=2026-02-28

Onboarding Optimization

Filter by entry page to see if different landing pages affect retention:

# Users who entered via homepage
GET .../retention?filter_page=/

# Users who entered via pricing
GET .../retention?filter_page=/pricing

Campaign Effectiveness

Compare retention across traffic sources:

# Organic search
GET .../retention?filter_utm_source=google&filter_utm_medium=organic

# Paid ads
GET .../retention?filter_utm_source=google&filter_utm_medium=cpc

Best Practices

Choose the Right Granularity

Daily: Apps with daily active users (DAU focus)
Weekly: Most SaaS products (default choice)
Monthly: Enterprise tools, infrequent-use products

Set Realistic Period Counts

Don’t request more periods than you have data:

# Bad: Requesting 12 weeks when you only have 4 weeks of data
max_periods=12&start_date=2026-02-01&end_date=2026-02-28

# Good: Match periods to available data
max_periods=4&start_date=2026-02-01&end_date=2026-02-28

Use Filters for Segmentation

Compare retention across:

Device types (mobile vs desktop)
Geographic regions
Traffic sources
Landing pages

Monitor Trends, Not Absolutes

Retention rates vary by industry. Focus on:

Trends over time: Is retention improving?
Relative comparisons: Which segments retain better?
Inflection points: Where does retention stabilize?

Troubleshooting

”Query timeout” Error

If you see a 408 response:

{
  "error": "Query timeout. Retry after 2 seconds."
}

Causes:

Dataset is too large for DuckDB
Too many concurrent retention queries

Solutions:

Reduce date range
Reduce max_periods
Add more filters to narrow the dataset
Upgrade to ClickHouse for large datasets

Sparse Data

If cohorts show 0% retention:

Insufficient traffic: Need at least ~100 visitors per cohort for meaningful data
Date range too recent: Later cohorts won’t have data for future periods yet
Filters too restrictive: Broaden filters or remove them

Get Started

Core Features

Integrations

Configuration

Self-Hosting

​Overview

​How Retention Works

​Example: Weekly Cohorts

​Granularity Options

​Day

​Week

​Month

​API Reference

​Endpoint

​Query Parameters

​Response

​Response Fields

​Implementation Details

​Granularity Parsing

​Period Limits

​Concurrency Control

​Filtering Retention Data

​Available Filters

​Dashboard Visualization

​Color Scale

​Tooltip

​Performance Considerations

​DuckDB (Self-Hosted)

​ClickHouse (Cloud)

​Use Cases

​Product-Market Fit

​Feature Launch Impact

​Onboarding Optimization

​Campaign Effectiveness

​Best Practices

​Choose the Right Granularity

​Set Realistic Period Counts

​Use Filters for Segmentation

​Monitor Trends, Not Absolutes

​Troubleshooting

​”Query timeout” Error

​Sparse Data

​Next Steps

Sessions

Journey Analysis

Build docs developers (and LLMs) love

Overview

How Retention Works

Example: Weekly Cohorts

Granularity Options

Day

Week

Month

API Reference

Endpoint

Query Parameters

Response

Response Fields

Implementation Details

Granularity Parsing

Period Limits

Concurrency Control

Filtering Retention Data

Available Filters

Dashboard Visualization

Color Scale

Tooltip

Performance Considerations

DuckDB (Self-Hosted)

ClickHouse (Cloud)

Use Cases

Product-Market Fit

Feature Launch Impact

Onboarding Optimization

Campaign Effectiveness

Best Practices

Choose the Right Granularity

Set Realistic Period Counts

Use Filters for Segmentation

Monitor Trends, Not Absolutes

Troubleshooting

”Query timeout” Error

Sparse Data

Next Steps