Overview
Pipes transform raw data from datasources into structured, queryable formats. The Web Analytics Starter Kit uses internal pipes for data transformation and endpoint pipes for API responses.This page focuses on the analytics_hits internal pipe - the core transformation layer that powers all downstream analytics.
analytics_hits
The primary transformation pipe that parses raw events and enriches them with device and browser detection.Purpose
Transforms rawanalytics_events into structured hits with:
- Parsed JSON fields from payload
- Device type detection (desktop, mobile, bot)
- Browser identification
- Domain resolution with fallback logic
- Flexible filtering capabilities
Node Architecture
Node 1: parsed_hits
Extracts and parses fields from the JSON payload.Key Transformations
Domain Resolution Logic
Domain Resolution Logic
Multi-tier fallback ensures domain is always populated:This handles cases where:
- Domain is pre-populated in the event
- Domain needs extraction from the full URL
- Domain is in the JSON payload
URL Hierarchy Handling
URL Hierarchy Handling
Complex URL parsing for edge cases:
Session ID Handling
Session ID Handling
Null-safe session IDs:Ensures session_id is never null for downstream aggregations.
Payload Filtering
Payload Filtering
Flexible text search across payload:Example usage:
like_filter: '%utm_campaign%'- Find UTM campaignslike_filter: '%referral%'- Find referral trafficlike_filter: '%mobile%'- Find mobile-specific events
Node 2: endpoint
Applies device and browser detection logic to parsed hits.Detection Strategy
Device Detection
Priority order:
- Bot detection (highest priority)
- Android devices
- iOS devices (iPad, iPhone, iPod)
- Desktop (default)
Bot traffic is identified first to filter out non-human visitors
Browser Detection
Supported browsers:
- Firefox
- Chrome (including CriOS for iOS)
- Opera
- Internet Explorer (MSIE/Trident)
- Safari (including iOS)
Parameters
The analytics_hits pipe accepts flexible parameters for filtering and pagination.Filter Parameters
Filter events by tenant ID for multi-tenant isolation
Filter events by specific domain
Starting date for time-range filtering
Ending date for time-range filtering
SQL LIKE pattern to filter payload JSON contentCommon patterns:
"%utm_%"- Any UTM parameters"%referral%"- Referral traffic"%campaign%"- Campaign tracking
Pagination Parameters
Maximum number of results to return
Page number for pagination (0-indexed)
Pagination formula:
OFFSET = page * limitOutput Schema
The analytics_hits pipe returns enriched hit records:Output Type
Field Descriptions
Two-letter country code (e.g., “US”, “GB”, “DE”)
This field is populated by your event ingestion logic, typically via IP geolocation
Detected device category:
desktop- Desktop/laptop computersmobile-android- Android phones/tabletsmobile-ios- iPhone, iPad, iPodbot- Automated crawlers and bots
Detected browser:
chrome- Google Chrome (including mobile)firefox- Mozilla Firefoxsafari- Safari (including iOS)opera- Opera browserie- Internet Explorer / Edge LegacyUnknown- Unrecognized browser
Domain extracted from the href field using ClickHouse URL functionsvs. domain field: current_domain is always computed from the URL, while domain may be pre-populated
Usage in Materialized Views
The analytics_hits pipe serves as the data source for all materialized views:Example: analytics_pages Materialization
Materialized views read from analytics_hits (not directly from analytics_events) to benefit from the parsing and enrichment logic.
Performance Considerations
Query Optimization
Filter Early
Apply
tenant_id and domain filters to leverage the analytics_events sorting key:Date Range Queries
Use date filters to enable partition pruning:
Limit Results
Always use pagination to avoid large result sets:
Avoid Payload Scanning
like_filter on payload is expensive - use sparingly and combine with other filtersReal-time vs Historical
- Real-time Queries
- Historical Queries
Use analytics_hits directly for recent data:✅ Up-to-date data
✅ Flexible filtering
⚠️ Higher query cost
✅ Flexible filtering
⚠️ Higher query cost
Type Inference
TypeScript types are automatically inferred:Type Exports
Using Pipe Types
Extending the Pipe
Adding Custom Detection
Extend device or browser detection for your needs:Custom Device Categories
Adding Extracted Fields
Extract additional fields from payload:Custom Fields
Next Steps
Materialized Views
See how MVs consume analytics_hits
Datasources
Review the underlying datasource schemas
API Endpoints
Explore endpoint pipes that query this data