Basic SQL Transform
Query a PCollection using SQL syntax.- Python
- Java
- Use
beam.Rowto create schema-aware elements - Reference the PCollection as
PCOLLECTIONin SQL - Standard SQL syntax (SELECT, WHERE, ORDER BY, etc.)
Streaming SQL with Windowing
Based on sdks/python/apache_beam/examples/sql_taxi.py:39-78- SQL computes results within existing windows
- Use windowing before SQL transform
- Access window metadata with
beam.DoFn.WindowParam - Combine with streaming sources like PubSub
SQL Aggregations
Perform complex aggregations using SQL.COUNT(*),COUNT(column)SUM(column),AVG(column)MIN(column),MAX(column)STDDEV(column),VAR(column)
Joining PCollections
Join multiple PCollections using SQL.- Python
- Java
Data Catalog Integration
Query external tables using Google Cloud Data Catalog. Based on sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlDataCatalogExample.java:61-83Filtering and Transformations
Use SQL for complex filtering and data transformations.CASEstatements for conditional logicCOALESCEfor null handlingCASTfor type conversionsIN,LIKEfor filtering- Date/time functions
Window Functions
Use SQL window functions for advanced analytics.ROW_NUMBER(),RANK(),DENSE_RANK()LAG(),LEAD()- Aggregate functions with
OVERclause PARTITION BYandORDER BY
Creating Schema-Aware PCollections
Different ways to create PCollections with schemas.Best Practices
Use Schemas
- Define clear schemas for your data
- Use
beam.Rowfor schema-aware elements - Leverage type inference when possible
Optimize Queries
- Filter early to reduce data volume
- Use appropriate indexes in external sources
- Consider query complexity vs. native transforms
Window Before SQL
- Apply windowing before SQL transforms
- SQL operates within window boundaries
- Access window metadata in post-processing
Test SQL Queries
- Validate SQL syntax before running
- Test with sample data locally
- Monitor query performance
SQL Dialect Support
Beam SQL supports multiple SQL dialects:- Calcite SQL (default): Standard SQL with extensions
- ZetaSQL: Google Standard SQL dialect
Related Resources
Beam SQL Overview
Complete SQL reference and capabilities
Schema Guide
Working with schemas in Beam
Joins
Alternative joining methods
Data Catalog
Using external table metadata