Streaming Word Count
A basic streaming pipeline that reads from PubSub and processes data in fixed windows.- Python
- Java
- Set
streaming=Truein pipeline options - Use
WindowIntowithFixedWindowsfor 15-second windows - Data is processed continuously as it arrives
Advanced Windowing with Triggers
Demonstrating different trigger types for controlling when results are emitted.Default Trigger (Watermark-based)
The default trigger fires when the watermark passes the end of the window.- Fires once when the watermark passes the window end
- Produces ON_TIME results
- Late data is dropped (zero allowed lateness)
Handling Late Data
Allow late data processing with allowed lateness.- Python
- Java
- Windows stay open for 1 day after watermark passes
- Each late element triggers a new pane (LATE timing)
- Use DISCARDING mode to get incremental updates
Speculative Results (Early Firings)
Get early approximations before all data arrives.- Get quick approximations for dashboards
- Progressive refinement of results
- All panes are marked EARLY (no watermark dependency)
Combined Trigger Strategy
Combine early firings, on-time results, and late data handling.- Python
- Java
- EARLY panes: Every 1 minute before window closes
- ON_TIME pane: When watermark passes window end
- LATE panes: Every 5 minutes after window closes
Windowing with Timestamps
Access window information in your pipeline for metadata enrichment.Session Windows
Group events based on activity sessions with gaps of inactivity.- User session analytics
- Detecting periods of activity
- Grouping related events
Sliding Windows
Create overlapping windows for moving averages and continuous analysis.Best Practices
Choose Appropriate Windows
- Fixed windows for regular intervals
- Session windows for user activity
- Sliding windows for moving calculations
Configure Allowed Lateness
- Balance completeness vs. resource usage
- Consider your data’s lateness characteristics
- Use watermark estimators for better accuracy
Select Accumulation Mode
- DISCARDING for independent updates
- ACCUMULATING for cumulative results
- Consider storage and computation trade-offs
Monitor Watermarks
- Track watermark lag in production
- Adjust allowed lateness based on metrics
- Use custom watermark estimators if needed
Common Patterns
Traffic Analysis Example
Based on examples/java/src/main/java/org/apache/beam/examples/cookbook/TriggerExample.java:160-337Related Resources
Windowing Guide
Learn windowing fundamentals
Triggers Guide
Deep dive into trigger mechanisms
Watermarks
Understanding watermarks and event time
Streaming I/O
Streaming sources and sinks