Core Concepts
Execution Plans
An execution plan is a directed acyclic graph (DAG) of nodes that process data:- Source nodes: Generate data (e.g., from tables, files, datasets)
- Transform nodes: Modify data (e.g., filter, project, aggregate)
- Sink nodes: Consume data (e.g., collect results, write to storage)
Declarations
Declarations describe execution nodes without actually constructing them:- C++
- Python
Building Query Plans
Filtering Data
- C++
- Python
Aggregation
- C++
- Python
Joins
- C++
- Python
Sorting
- C++
- Python
Execution Modes
Synchronous Execution
- C++
- Python
Asynchronous Execution
- C++
- Python
Streaming Results
- C++
- Python
Query Options
Customize query execution withQueryOptions:
- C++
- Python
Performance Optimization
- Use streaming execution for large datasets to avoid loading everything in memory
- Enable multi-threading with
use_threads=truefor parallel processing - Push down filters early in the pipeline to reduce data volume
- Batch operations to amortize overhead
- Use appropriate join algorithms based on data size and distribution
Next Steps
- Learn about Expressions and Filters for query predicates
- Explore Working with Datasets for large-scale data
- See Compute Functions for available operations