Core Concepts Overview

Introduction

Materialize is a real-time data integration platform that creates and continually updates consistent views of transactional data. Built on top of differential dataflow and timely dataflow, Materialize enables SQL-based stream processing with strong consistency guarantees. This section introduces the core architectural concepts that make Materialize unique:

Sources

Connect to external systems like Kafka, PostgreSQL, and MySQL

Materialized Views

Incrementally maintain query results in durable storage

Indexes

Store query results in memory for instant access

Sinks

Push data to external systems like Kafka

Clusters

Isolated compute resources for workload isolation

Architecture Overview

Materialize’s architecture consists of three main layers:

SQL Front-end   <─┐   Coordinator       Dataflow Layer
SQL Front-end   <─┼──> • Optimizer  <──> • Sources
SQL Front-end   <─┘    • Catalog         • Sinks
                       • Timestamp       • Compute

Key Architectural Components

Coordinator

The coordinator is the “brains” of Materialize. It manages:

Metadata and catalog information
Query execution and optimization
Timestamp selection for linearizability
Frontier tracking for data completeness

Storage Layer

Responsible for:

Data persistence
Sources and sinks
Reclocking for timestamp alignment
Change data capture ingestion

Compute Layer

Handles:

Dataflow execution using differential dataflow
Horizontal scaling across replicas
Incremental view maintenance
Active replication for fault tolerance

Differential Dataflow

At its core, Materialize uses differential dataflow to process data as streams of updates. Each update is represented as a triple:

(data, time, diff)

data: The actual row data
time: A logical timestamp (e.g., transaction ID, milliseconds since epoch)
diff: An integer representing the change (+1 for insert, -1 for delete)

This representation enables:

Incremental Computation

Only the changes are processed, not entire datasets

Multiversion State

Historical data access at any timestamp

Efficient Joins

Delta joins avoid intermediate state blowup

Consistency

All operations maintain transactional consistency

Reaction Time

Materialize optimizes for reaction time — the total delay from data change to queryable result:

Reaction Time = Freshness + Query Latency

OLTP Systems

High Reaction Time

Excellent freshness
Poor query latency for analytics

Data Warehouses

High Reaction Time

Poor freshness (batch ingestion)
Excellent query latency

Materialize

Low Reaction Time

Excellent freshness (streaming)
Excellent query latency (incremental)

Consistency Guarantees

Materialize provides linearizability — the strongest consistency guarantee possible. This means:

Query results reflect a consistent snapshot across all data sources
Operations within a transaction maintain atomicity
Cross-source joins produce consistent results
No eventual consistency or approximate answers

How It Works

The coordinator determines timestamps for queries by tracking:

Lower bounds: Times with valid data available
Upper bounds: Times with complete data available
Logical compaction frontiers: Oldest queryable timestamps

Every query executes at a specific timestamp, ensuring consistent reads even when data is constantly changing.

Data Flow Model

Data flows through Materialize in this sequence:

Ingestion

Sources continuously read from external systems (Kafka, PostgreSQL, MySQL)

Transformation

Materialized views apply SQL transformations incrementally

Storage

Results persist in durable storage with incremental updates

Indexing

Indexes load results into memory for fast queries

Serving

Queries read from indexes or sinks push to external systems

Common Use Cases

Materialize excels at three primary patterns:

Query Offload (CQRS)

Scale complex read queries more efficiently than read replicas:

-- Create a materialized view for complex aggregations
CREATE MATERIALIZED VIEW customer_metrics AS
SELECT 
    customer_id,
    COUNT(*) as order_count,
    SUM(total_amount) as lifetime_value,
    AVG(total_amount) as avg_order_value
FROM orders
GROUP BY customer_id;

-- Index for fast lookups
CREATE INDEX idx_customer ON customer_metrics (customer_id);

Integration Hub (ODS)

Combine data from multiple sources into unified views:

-- Join across PostgreSQL and Kafka sources
CREATE MATERIALIZED VIEW enriched_events AS
SELECT 
    e.event_id,
    e.user_id,
    u.email,
    u.segment,
    e.event_type,
    e.timestamp
FROM kafka_events e
JOIN postgres_users u ON e.user_id = u.id;

Operational Data Mesh (ODM)

Create real-time data products for downstream consumers:

-- Maintain real-time inventory levels
CREATE MATERIALIZED VIEW inventory_status AS
SELECT 
    product_id,
    SUM(CASE WHEN type = 'purchase' THEN quantity ELSE 0 END) -
    SUM(CASE WHEN type = 'sale' THEN quantity ELSE 0 END) as current_stock
FROM inventory_events
GROUP BY product_id;

-- Stream to Kafka for downstream services
CREATE SINK inventory_sink
FROM inventory_status
INTO KAFKA CONNECTION kafka_conn (TOPIC 'inventory-status')
FORMAT JSON;

Next Steps

Dive deeper into specific concepts:

Learn About Sources

Discover how to connect Materialize to your data sources

Explore Materialized Views

Understand incremental view maintenance

Understand Indexes

Learn about in-memory query acceleration

Configure Clusters

Optimize compute resources and isolation

Get Started

Core Concepts

Ingesting Data

Transform Data

Serve Results

Operations

Integrations

Core Concepts Overview

Introduction

Sources

Materialized Views

Indexes

Sinks

Clusters

Architecture Overview

Key Architectural Components

Differential Dataflow

Reaction Time

OLTP Systems

Data Warehouses

Materialize

Consistency Guarantees

How It Works

Data Flow Model

Common Use Cases

Query Offload (CQRS)

Integration Hub (ODS)

Operational Data Mesh (ODM)

Next Steps

Learn About Sources

Explore Materialized Views

Understand Indexes

Configure Clusters

Build docs developers (and LLMs) love

Get Started

Core Concepts

Ingesting Data

Transform Data

Serve Results

Operations

Integrations

​Introduction

Sources

Materialized Views

Indexes

Sinks

Clusters

​Architecture Overview

​Key Architectural Components

​Differential Dataflow

​Reaction Time

OLTP Systems

Data Warehouses

Materialize

​Consistency Guarantees

​How It Works

​Data Flow Model

​Common Use Cases

​Query Offload (CQRS)

​Integration Hub (ODS)

​Operational Data Mesh (ODM)

​Next Steps

Learn About Sources

Explore Materialized Views

Understand Indexes

Configure Clusters

Build docs developers (and LLMs) love

Introduction

Architecture Overview

Key Architectural Components

Differential Dataflow

Reaction Time

Consistency Guarantees

How It Works

Data Flow Model

Common Use Cases

Query Offload (CQRS)

Integration Hub (ODS)

Operational Data Mesh (ODM)

Next Steps