Overview
Data retention in Sparklytics works as follows:- Raw events (
eventstable) are kept for a configurable number of days - Events older than the retention period are automatically deleted
- Aggregate data is never deleted (daily/monthly rollups, if implemented)
- Cleanup runs automatically in the background
Default retention is 365 days (1 year). Adjust based on your storage capacity and compliance requirements.
Configuration
Number of days to keep raw event data.Events with
created_at older than this threshold are automatically deleted.How Retention Works
Automatic Cleanup
Sparklytics runs a background task that periodically deletes events older thanSPARKLYTICS_RETENTION_DAYS.
- Frequency: Currently runs on-demand (manual trigger or on startup)
- Query:
DELETE FROM events WHERE created_at < NOW() - INTERVAL 'X days' - Performance: Runs in a transaction; may take several seconds for large datasets
What Gets Deleted
Only raw event records are deleted:| Table | Retention Policy |
|---|---|
events | Deleted after RETENTION_DAYS |
sessions | Deleted when all associated events are deleted |
websites | Never deleted (metadata) |
api_keys | Never deleted (configuration) |
goals | Never deleted (configuration) |
Future versions will include pre-aggregated daily/monthly summaries that are kept indefinitely, independent of raw event retention.
Storage Estimates
Raw event storage grows linearly with traffic. Here are approximate sizes per 1 million events:| Backend | Storage per 1M Events |
|---|---|
| DuckDB (self-hosted) | ~278 MB |
| ClickHouse (cloud) | ~48 MB |
Example Calculations
Small site: 100k pageviews/month- 1 year retention:
100k * 12 * 278 MB / 1M = ~334 MB
- 6 months retention:
1M * 6 * 278 MB / 1M = ~1.7 GB
- 3 months retention:
10M * 3 * 278 MB / 1M = ~8.3 GB
These are DuckDB estimates. ClickHouse (cloud) is ~5.8x more storage-efficient.
Recommended Retention Periods
| Use Case | Recommended Retention | Reasoning |
|---|---|---|
| Personal blog | 180-365 days | Low traffic, storage not a concern |
| SaaS product | 90-180 days | Balance between insights and cost |
| High-traffic media | 30-90 days | Large volume, focus on recent trends |
| Compliance (GDPR) | 30-90 days | Minimize PII retention |
| Long-term analysis | 730+ days | Historical comparisons, trends |
Manual Cleanup
To manually delete old events (e.g., to free up space immediately):Docker
Bare-Metal
You don’t need to stop Sparklytics to run manual cleanup, but be aware that DuckDB uses a single-writer model. The DELETE may block briefly if the background buffer is flushing events.
Changing Retention Policy
You can changeSPARKLYTICS_RETENTION_DAYS at any time:
Monitoring Retention
Check Oldest Event
NOW() - INTERVAL 'RETENTION_DAYS days' to verify cleanup is working.
Check Database Size
Check Event Count by Age
Backup Before Cleanup
If you’re unsure about your retention policy, back up your database before letting cleanup run:DuckDB backups are consistent snapshots — the database remains online during backup.
Retention vs. Aggregates (Future)
Future versions of Sparklytics will include pre-aggregated rollups:- Daily aggregates: Pageviews, visitors, bounce rate per day
- Monthly aggregates: Same metrics, rolled up by month
- Permanent retention: Aggregates are kept indefinitely, even after raw events are deleted
- Keep raw events for 30-90 days (storage-efficient)
- View historical trends for years (from aggregates)
Aggregates are planned for v1.2. Currently, deleting events removes all associated data.
Compliance Considerations
GDPR (EU)
- Data minimization: Only keep data as long as necessary for analytics
- Typical retention: 30-90 days for web analytics
- User rights: If a user requests data deletion, you may need to purge events containing their
visitor_id
CCPA (California)
- Consumer rights: Users can request deletion of their personal information
- Anonymous visitor IDs: Sparklytics visitor IDs are hashed and salted, making them hard to tie to individuals, but may still be considered personal data
Sparklytics does not store IP addresses, emails, or other direct identifiers. However,
visitor_id is derived from IP + User-Agent and may be considered PII under strict interpretations.Manual Data Deletion (GDPR/CCPA Requests)
To delete all data for a specificvisitor_id:
- Look up the visitor ID in recent events matching the IP’s approximate geolocation
- Or, regenerate the visitor ID hash using the same salt + IP + User-Agent
Troubleshooting
Retention Cleanup Not Running
Symptom: Events older thanRETENTION_DAYS are still in the database.
Diagnosis:
-
Check retention setting:
-
Check oldest event:
- Check if cleanup task is enabled (look for logs mentioning retention)
Database Size Not Decreasing After Deletion
Symptom:DELETE FROM events WHERE ... completes, but file size stays the same.
Cause: DuckDB does not automatically reclaim disk space after deletions (like SQLite, it leaves “holes” in the file).
Solution: Vacuum the database:
See Also
- Environment Variables — Full configuration reference
- Performance Tuning — Optimization and backup strategies