Overview
Indexes in Materialize represent query results stored in memory within a cluster. Unlike traditional databases, Materialize indexes maintain complete, up-to-date results—not just keys and pointers. This guide covers when to create indexes, how to optimize them, and operational best practices.Understanding Indexes
What Indexes Store
Indexes maintain:- Complete, up-to-date query results in memory
- Incrementally updated data as new information arrives
- Results scoped to a specific cluster
Indexes vs. Materialized Views
| Feature | Index | Materialized View |
|---|---|---|
| Storage | In-memory (cluster-local) | Durable storage (system-wide) |
| Access | Same cluster only | Any cluster |
| Use case | Fast queries within cluster | Share results across clusters |
| Query cost | Computationally free | Computationally free |
| Query speed | Fastest (from memory) | Fast (from storage) |
When to Create Indexes
On Sources
Rarely needed. Consider indexing a source when:- Serving results directly from the source
- Using the source in joins without transformation
On Views
Create indexes on views to maintain up-to-date results in memory:- View is executed and results stored in memory
- As new data arrives, results update incrementally
- Queries are served instantly from memory with no computation
On Materialized Views
Indexing a materialized view loads durable results into memory:- Materialized views are queryable from any cluster
- Indexes provide faster access within the cluster where created
- No additional computation required (results already maintained)
Creating Indexes
Basic Index Creation
Create an index on specific columns:Multi-Column Indexes
Index multiple columns for complex queries:Expression Indexes
Index computed expressions:Default Indexes
Create an index using inferred unique keys:Cluster-Specific Indexes
Explicitly specify the target cluster:Index Usage Patterns
Point Lookups
Materialize performs point lookups (reads only matching records) when the query’sWHERE clause:
- Specifies equality (
=orIN) on all indexed fields - Uses only
ANDto combine conditions - Matches the exact indexed expression (including type)
Multi-Column Point Lookups
Expression Matching
For expression indexes, queries must match exactly:Index Scans
Materialize performs full index scans when:- Not all indexed fields are specified
- Using non-equality conditions (
>,<,LIKE) - Using
ORbetween different fields - Type mismatch in equality conditions
Verifying Index Usage
UseEXPLAIN to verify how queries use indexes:
lookup value=indicates point lookup (fastest)(lookup)in Used Indexes sectionfast pathmeans query executes outside regular dataflow
Index Maintenance
Monitoring Index Memory Usage
Check memory consumption by index:Viewing All Indexes
List indexes with their configuration:Viewing Index Keys
Inspect index column definitions:Dropping Unused Indexes
Remove indexes to free memory:Rebuilding Indexes
To rebuild an index with a different configuration:Optimization Strategies
Point Lookup Optimization
Design indexes for equality-based queries:Join Optimization
Index join keys to accelerate joins:Shared Indexes
Multiple queries can share the same index:Memory Management
Understanding Memory Footprint
Index memory usage is proportional to:- Current size of the indexed data
- Number of columns stored
- Data types (complex types use more memory)
- Compaction rate
Estimating Index Size
Rough estimation:Memory Best Practices
- Index selectively: Don’t index everything—only what queries actually use
- Use narrow indexes: Include only necessary columns in keys
- Share indexes: Design indexes that multiple queries can use
- Monitor usage: Regularly review which indexes are actually used
- Consider materialized views: For cross-cluster access, use materialized views instead
Identifying Large Indexes
Best Practices
Index Design
- Match query patterns: Index columns that appear in
WHEREclauses - Exact expression matching: Ensure indexed expressions match query expressions exactly
- Consider cardinality: High-cardinality columns make better index keys
- Order matters for partial lookups: First columns in multi-column indexes are most important
Operational Guidelines
- Start minimal: Create indexes only when performance testing reveals the need
- Use EXPLAIN: Always verify queries use indexes as expected
- Monitor memory: Track index memory usage relative to cluster capacity
- Cluster sizing: Ensure clusters have sufficient memory for all indexes
- Document purpose: Maintain documentation about why each index exists
Development Workflow
- Develop without indexes: Write queries first
- Identify slow queries: Use query profiling to find bottlenecks
- Create targeted indexes: Add indexes for specific performance problems
- Verify improvement: Measure query performance before and after
- Clean up: Remove indexes that don’t provide measurable benefit
Troubleshooting
Query Not Using Index
Symptoms: Slow query despite having an index Diagnosis:- Query runs in different cluster than index
- Index keys don’t match query conditions
- Type mismatch in WHERE clause
- Query uses OR between different fields
High Memory Usage
Symptoms: Cluster running out of memory Diagnosis:Slow Index Creation
Symptoms:CREATE INDEX takes a long time
Cause: Initial hydration computing all historical results
Solution:
- Create indexes during low-traffic periods
- Ensure sufficient cluster resources
- Consider creating indexes in stages for very large datasets
Advanced Topics
Indexes and Arrangements
Internally, indexes use arrangements—Materialize’s optimized data structures. The system automatically:- Shares arrangements between operators
- Compacts data to reduce memory
- Maintains delta updates for efficiency
Index Ordering Limitations
Materialize does not use index key order for:ORDER BYclauses- Range queries
- Sorted merges