Overview
This guide covers performance optimization strategies for Materialize, including query analysis with EXPLAIN, index design, join optimization, and troubleshooting slow queries.Understanding Query Execution
Execution Strategies
Materialize uses three execution strategies:- Constant: Computed in the control plane without cluster involvement
- Fast-path: Direct read from an in-memory index
- Standard: Computed by a temporary dataflow in the cluster
Using EXPLAIN to Analyze Queries
Basic EXPLAIN Usage
Analyze how a query will execute:EXPLAIN Stages
View different optimization stages:EXPLAIN Modifiers
Get additional information:Verifying Index Usage
Confirm that indexes are used:lookup value=indicates point lookup (fastest)(lookup)in “Used Indexes” section(scan)indicates full index scan (slower)- No index listed means full table scan (slowest)
Query Optimization Techniques
Point Lookup Optimization
Design indexes to support equality-based queries:- Equality conditions (
=orIN) on ALL indexed fields - Only
ANDto combine conditions - Exact match of indexed expressions (including types)
Multi-Column Point Lookups
Expression Indexes
Index computed expressions that appear in queries:Handling OR Clauses
Same fields with OR:Join Optimization
Basic Join Indexing
Index join keys on both sides:Multi-Way Joins
For queries joining multiple tables:Join with Filters
Place filtered input first in FROM clause:Understanding Join Strategies
Use EXPLAIN to see join implementation:U= unique keyK= key lengthA= already arranged (index exists)
Materialized Views vs. Indexes
When to Use Materialized Views
Use materialized views when:- Results need to be shared across multiple clusters
- Query results are large and frequently accessed
- Multiple dependent queries build on the same computation
When to Use Indexes
Use indexes when:- Results are only queried from one cluster
- You want to accelerate queries on existing views
- Memory is available in the target cluster
Combining Both
Use materialized views for shared computation, indexes for fast access:Troubleshooting Slow Queries
Diagnostic Workflow
- Check query history:
- Run EXPLAIN:
- Check for index usage: Look for “Used Indexes” section in EXPLAIN output.
- Verify cluster has capacity:
Common Issues and Solutions
Issue: Query Not Using Index
Symptoms: EXPLAIN shows no indexes used Causes:- Query runs in different cluster than index
- Index keys don’t match query predicates
- Type mismatch in WHERE clause
Issue: Full Index Scan Instead of Point Lookup
Symptoms: EXPLAIN shows “(scan)” instead of “(lookup)” Cause: WHERE clause doesn’t match point lookup criteria Solution: Adjust query or index to match:Issue: High Memory Usage
Symptoms: Cluster running out of memory Diagnosis:- Drop unused indexes
- Resize cluster:
ALTER CLUSTER name SET (SIZE = 'larger-size') - Convert indexes to materialized views for cross-cluster sharing
- Optimize data models to reduce memory footprint
Issue: Lagging Materialized Views
Symptoms: Queries wait for materialized views to catch up Diagnosis:- Increase cluster size
- Optimize materialized view queries
- Reduce input data volume
- Check for resource contention
Advanced Optimization
Temporal Filter Pushdown
Use temporal filters to reduce data scanned:- Skips old data at storage layer
- Reduces memory usage
- Faster query execution
Query Hints for TopK
Optimize top-K queries:Avoiding Cartesian Products
Always specify join conditions:Performance Best Practices
Index Design
- Start minimal: Create indexes based on actual query patterns
- Match query patterns: Index columns used in WHERE clauses
- Exact expressions: Index expressions must match query expressions exactly
- Consider cardinality: High-cardinality columns make better index keys
- Use EXPLAIN: Always verify indexes are used as expected
Query Design
- Filter early: Apply filters before joins when possible
- Use proper types: Ensure WHERE clause types match column types
- **Avoid SELECT ***: Select only needed columns
- Rewrite OR as UNION: For different fields in OR clauses
- Add LIMIT: When full results aren’t needed
Cluster Management
- Right-size clusters: Monitor utilization and adjust
- Isolate workloads: Use separate clusters for different workload types
- Monitor lag: Track materialized view freshness
- Plan for hydration: Account for initial and steady-state memory
Development Workflow
- Profile first: Identify actual bottlenecks before optimizing
- Measure impact: Compare performance before and after changes
- Iterate: Make one change at a time
- Document: Record why indexes and optimizations exist