Data Processing Overview
Processing Architecture
Data Flow Stages
Collection
Browser → Edge Network
- JavaScript tracker captures events
- Data sent via HTTPS POST to
basket.databuddy.cc - Request routed through global CDN for low latency
Ingestion
Edge → Ingestion API
- Request validated and sanitized
- IP address used for geo-lookup (country/region)
- IP immediately discarded, never stored
- Bot detection applied
Queuing
Ingestion → Message Queue
- Events batched for efficiency
- Processed via Redpanda/Kafka message queue
- Ensures durability and scalability
Storage
Queue → Database
- Events written to ClickHouse (columnar database)
- Optimized for analytical queries
- Anonymous data only
Aggregation
Database → Analytics Engine
- Real-time and batch aggregations
- Pre-computed statistics for dashboard
- Historical trend analysis
Processing Locations
Primary Data Center
United States (Primary)- Cloud provider: AWS / Google Cloud
- Regions: us-east, us-west
- SOC 2 Type II certified infrastructure
- 99.99% uptime SLA
Edge Network
Global CDN Distribution:- 50+ edge locations worldwide
- Cloudflare or similar CDN provider
- Routes traffic to nearest edge node
- Reduces latency for all users
- North America (USA, Canada)
- Europe (UK, Germany, France, Netherlands)
- Asia Pacific (Singapore, Japan, Australia)
- South America (Brazil)
- Middle East (UAE)
Privacy guarantee: Data is collected at edge nodes but all processing and storage happens in primary data centers. Edge nodes only route traffic; they don’t store analytics data.
Database Storage
ClickHouse Database:- Columnar storage optimized for analytics
- Hosted in US data centers
- Encrypted at rest (AES-256)
- Daily backups with 30-day retention
- Geo-redundant storage
- High-throughput event streaming
- Deployed in same region as database
- Temporary event buffer (minutes, not days)
- Ensures no data loss during spikes
rust/ingestion/src/producer.rs shows Redpanda/Kafka producer implementation
Data Processing Details
Ingestion Pipeline
Implemented in Rust for performance and security:Ingestion Service (main.rs)
rust/ingestion/src/main.rs:1-48
Security measures:
- Origin validation (CORS)
- Client ID verification
- Request size limits (prevents abuse)
- Rate limiting per client
- Bot detection
IP Address Handling
IP processing flow: Documented in:rust/ingestion/MIGRATION.mdx:29
GeoIP enrichment:
- Extract IP from request headers (
cf-connecting-ip,x-forwarded-for,x-real-ip) - Perform GeoIP lookup to determine country, region, city
- Add geographic data to event payload
- Immediately discard IP address - not written to database
- Store only:
{"country": "United States", "region": "California"}
- IP addresses can be personal data under GDPR
- We use IP only for immediate geo-enrichment
- IP is not stored, cannot be retrieved
- Only aggregate country/region stored (not personal data)
Data Validation & Sanitization
All incoming data is validated before processing:Validation Schema Example
packages/validation/src/schemas/analytics.ts
Protections:
- Type validation (correct data types)
- Length limits (prevent large payloads)
- Schema enforcement (reject malformed data)
- XSS prevention (sanitize strings)
- SQL injection prevention (parameterized queries)
Bot Detection
Automated traffic filtering:Bot Detection (tracker.ts:112-130)
packages/tracker/src/core/tracker.ts:112-130
Additional bot detection:
- User-agent analysis (server-side)
- Known bot IP ranges
- Behavioral patterns (no mouse movement, instant clicks)
- Request rate anomalies
packages/shared/src/utils/bot-detection/detector.ts
Batching & Performance
Events are batched to reduce network overhead:Event Batching (tracker.ts:351-363)
packages/tracker/src/core/tracker.ts:351-363
Batching benefits:
- Reduces network requests (10 events → 1 request)
- Improves performance (less overhead)
- Better reliability (automatic retries)
- Lower costs (fewer serverless invocations)
- Default batch size: 10 events
- Default timeout: 5 seconds
- Configurable per website
Data Storage
Database Technology
ClickHouse (Columnar Database)- Why ClickHouse?
- Data Organization
- Query Optimization
- Speed: 100-1000x faster than traditional databases for analytics
- Compression: 10x better data compression (lower storage costs)
- Scalability: Handles billions of events
- Real-time: Sub-second query response times
- Cost-effective: Optimized for analytical workloads
Encryption
At Rest:- AES-256 encryption for all stored data
- Encrypted database volumes
- Encrypted backups
- Key rotation every 90 days
- TLS 1.3 for all connections
- HTTPS only (no HTTP)
- Certificate pinning
- Perfect forward secrecy
HTTPS-Only Client (client.ts:68-76)
packages/tracker/src/core/client.ts:68-76
Backup & Recovery
Automated Backups:- Daily full backups
- Hourly incremental backups
- 30-day backup retention
- Geo-redundant storage (multiple regions)
- RPO (Recovery Point Objective): 1 hour
- RTO (Recovery Time Objective): 4 hours
- Regular disaster recovery drills
- Documented recovery procedures
Data Access Controls
Authentication
Multi-layered authentication:- API Keys
- OAuth / SSO
- Scoped to specific websites
- Read-only or read-write permissions
- Revocable anytime
- Expire after inactivity
Authorization
Role-Based Access Control (RBAC):| Role | Permissions |
|---|---|
| Owner | Full access, billing, user management |
| Admin | Manage settings, view all data, manage team |
| Member | View analytics, create reports |
| Viewer | Read-only access to dashboards |
| API | Programmatic access (scope-limited) |
- Each website has a unique
website_id - All queries filtered by
website_id - Cross-website access impossible
- Tenant isolation enforced at database level
Audit Logging
All sensitive actions are logged:Audit Log Example
- Data exports
- User management changes
- API key creation/revocation
- Settings modifications
- Data deletions
- Login attempts (success and failure)
Security Measures
Infrastructure Security
DDoS Protection
- Cloudflare DDoS mitigation
- Rate limiting per IP
- Request size limits
- Bot challenge for suspicious traffic
Network Security
- Private VPC network
- Firewall rules (allow-list only)
- No public database access
- VPN required for admin access
Application Security
- Input validation on all endpoints
- Parameterized queries (no SQL injection)
- XSS prevention (output encoding)
- CSRF tokens on forms
Monitoring
- 24/7 security monitoring
- Intrusion detection system
- Anomaly detection
- Automated alerting
Compliance Certifications
SOC 2 Type II
SOC 2 Type II
Status: In progress (expected 2024)Annual third-party audit of:
- Security controls
- Availability guarantees
- Processing integrity
- Confidentiality measures
- Privacy protections
GDPR
GDPR
Status: Compliant by design
- Privacy by design and default
- Data minimization
- Purpose limitation
- Storage limitation (configurable retention)
- Data subject rights (opt-out, deletion)
ISO 27001
ISO 27001
Status: Planned (2025)Information security management system certification.
Penetration Testing
- Annual third-party penetration tests
- Continuous security scanning
- Bug bounty program (coming soon)
- Vulnerability disclosure policy
Data Processing Agreements
GDPR Data Processing Agreement (DPA)
For EU customers, Databuddy can provide a DPA that covers:- Roles and responsibilities (controller vs processor)
- Processing instructions
- Data security measures
- Sub-processor disclosure
- Cross-border data transfers (if applicable)
- Data subject rights assistance
- Breach notification procedures
Important: Because Databuddy only processes anonymous data (not personal data under GDPR), a DPA is often not legally required. However, we provide one upon request for your compliance documentation.
Standard Contractual Clauses (SCCs)
If your organization requires SCCs for cross-border data transfers:- Available for EU → US data flows
- Compliant with Schrems II decision
- Includes transfer impact assessment
- Updated to 2021 European Commission SCCs
Regional Data Processing
Europe (EU/EEA)
Current setup:- Data collected in EU is processed in US data centers
- IP addresses discarded immediately (no personal data transfer)
- Only anonymous event data transferred
- GDPR compliant under anonymous data exception
- EU-only data residency option (2024)
- Dedicated EU data centers (Germany/Ireland)
- Zero data transfer outside EU (optional)
United Kingdom
- UK GDPR compliant
- Data processed in US (same as EU)
- UK DPA available upon request
- Adequacy decision ensures lawful transfers
California (CCPA/CPRA)
- Anonymous data not subject to CCPA
- No “sale” of personal information
- Consumer rights not applicable (no personal data)
- Privacy policy disclosure recommended
Other Regions
- Brazil (LGPD): Compliant under anonymous data provisions
- Canada (PIPEDA): Compliant
- Australia (Privacy Act): Compliant
- Asia-Pacific: Varies by jurisdiction, generally compliant
Third-Party Services
Databuddy uses minimal third-party services:| Service | Purpose | Data Shared | Location |
|---|---|---|---|
| Cloudflare | CDN, DDoS protection | Request metadata (IP, headers) | Global |
| AWS/GCP | Infrastructure hosting | Encrypted data at rest | US |
| Stripe | Billing (dashboard only) | Payment info (PCI compliant) | US |
| Resend | Transactional emails | Email addresses | US |
- Advertising networks
- Data brokers
- Marketing platforms
- AI training datasets
- Any third party without your explicit consent
Transparency & Trust
Open Source Components
Parts of Databuddy are open source:- JavaScript tracker (client SDK)
- Validation schemas
- Public API specifications
Status Page
Real-time service status:- Uptime monitoring
- Incident history
- Scheduled maintenance
- Performance metrics
Privacy-First Guarantees
We will never sell your data
We will never share data with advertisers
We will never use your data for AI training
We will never change our privacy model
Frequently Asked Questions
Where is my data physically stored?
Where is my data physically stored?
Primary storage is in US data centers (AWS/GCP). Data is encrypted at rest and backed up to geo-redundant storage. EU-only storage will be available in 2024.
Can I request EU-only data processing?
Can I request EU-only data processing?
Coming 2024. We’re building dedicated EU infrastructure for customers who require data residency in the EU. Contact sales for early access.
Do you use sub-processors?
Do you use sub-processors?
Yes, minimal sub-processors:
- Cloudflare (CDN/DDoS)
- AWS/GCP (infrastructure)
- Stripe (payments - dashboard only)
How do you handle cross-border data transfers?
How do you handle cross-border data transfers?
Since we only process anonymous data (not personal data under GDPR), cross-border restrictions don’t apply. However, we can provide SCCs upon request for compliance documentation.
What happens to my data if I cancel?
What happens to my data if I cancel?
Upon account cancellation:
- Data retained for 30 days (recovery period)
- Automatic deletion after 30 days
- Immediate deletion available upon request
- Export your data before cancellation
Can I export all my data?
Can I export all my data?
Yes. Use the dashboard export feature or API to download:
- Raw events (CSV, JSON, Parquet)
- Aggregated statistics
- Custom reports
How do you ensure data isolation between customers?
How do you ensure data isolation between customers?
- Unique
website_idper customer - Database-level tenant isolation
- All queries filtered by
website_id - Regular security audits
- Penetration testing
Contact & Support
Data Processing Questions
Privacy team: [email protected] Security team: [email protected] DPA requests: [email protected]Security Disclosure
Found a security vulnerability? Report to: [email protected]- Responsible disclosure appreciated
- 90-day disclosure timeline
- Acknowledgment in security hall of fame
- Bug bounty program (coming soon)
Next Steps
GDPR Compliance
Understand GDPR compliance by default
Data Retention
Configure retention policies
Cookie Policy
Learn about our cookieless approach
API Reference
Access data programmatically