Skip to main content

Overview

CREATE CLUSTER creates a new cluster, which provides isolated compute resources (CPU, memory, and disk) for maintaining dataflows and serving queries.

Syntax

CREATE CLUSTER [IF NOT EXISTS] <cluster_name> (
  SIZE = '<size>',
  REPLICATION FACTOR = <count>,
  [SCHEDULE = { ON REFRESH | MANUAL }]
);

Key Concepts

Resource Isolation

Clusters provide dedicated compute resources:
-- Development cluster
CREATE CLUSTER dev (
  SIZE = 'M.1-xsmall',
  REPLICATION FACTOR = 1
);

-- Production cluster
CREATE CLUSTER prod (
  SIZE = 'M.1-large',
  REPLICATION FACTOR = 2
);
Workloads on different clusters don’t compete for resources.

Cluster Sizes

Materialize offers M.1 cluster sizes:
SizeCredits/HourUse Case
M.1-xsmall0.5Development, testing
M.1-small1Small workloads
M.1-medium2Medium workloads
M.1-large4Production workloads
M.1-xlarge8Large production workloads
M.1-2xlarge16Very large workloads
M.1-3xlarge32Extremely large workloads
M.1-4xlarge64Maximum performance
Larger sizes provide:
  • More CPU cores
  • More memory
  • More disk space

Parameters

cluster_name
identifier
required
Name of the cluster to create
SIZE
string
required
Cluster size (e.g., 'M.1-large')
REPLICATION FACTOR
integer
required
Number of replicas (1 or more for availability, 0 to pause)
SCHEDULE
keyword
Scheduling strategy: ON REFRESH for scheduled clusters, MANUAL for always-on (default)

Replication Factor

The replication factor determines fault tolerance:
-- Single replica (no fault tolerance)
CREATE CLUSTER single (
  SIZE = 'M.1-medium',
  REPLICATION FACTOR = 1
);

-- High availability (can tolerate 1 failure)
CREATE CLUSTER ha (
  SIZE = 'M.1-medium',
  REPLICATION FACTOR = 2
);
Replication increases fault tolerance, not capacity.Each replica does the same work. To increase capacity, use a larger SIZE.
Replica Properties:
  • Automatically named (r1, r2, etc.)
  • Never on same physical hardware
  • Spread across availability zones
  • Process identical computations

Active Cluster

Set the active cluster for operations:
-- Show current cluster
SHOW cluster;

-- Switch to different cluster
SET cluster = prod;

-- Create objects in specific cluster
CREATE MATERIALIZED VIEW mv IN CLUSTER prod AS ...;

Pausing Clusters

Set replication factor to 0 to pause:
-- Pause cluster (stop all work)
ALTER CLUSTER dev SET (REPLICATION FACTOR = 0);

-- Resume cluster
ALTER CLUSTER dev SET (REPLICATION FACTOR = 1);
Paused clusters:
  • Consume no credits
  • Don’t maintain indexes or materialized views
  • Can’t serve queries

Scheduled Clusters

For materialized views with refresh strategies:
CREATE CLUSTER scheduled (
  SIZE = 'M.1-large',
  SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour')
);

CREATE MATERIALIZED VIEW daily_report
  IN CLUSTER scheduled
  WITH (REFRESH EVERY '1 day' ALIGNED TO '2024-01-01 00:00:00')
  AS SELECT * FROM data;
Scheduled clusters:
  • Turn on automatically before refreshes
  • Turn off after refreshes complete
  • Only consume credits when active
  • Require HYDRATION TIME ESTIMATE

Examples

Development Cluster

CREATE CLUSTER development (
  SIZE = 'M.1-xsmall',
  REPLICATION FACTOR = 1
);

SET cluster = development;

CREATE MATERIALIZED VIEW test_view AS
  SELECT * FROM source WHERE amount > 100;

Production Cluster with High Availability

CREATE CLUSTER production (
  SIZE = 'M.1-large',
  REPLICATION FACTOR = 3  -- Can tolerate 2 failures
);

SET cluster = production;

CREATE MATERIALIZED VIEW critical_metrics AS
  SELECT
    date_trunc('minute', event_time) AS minute,
    COUNT(*) AS event_count
  FROM events
  GROUP BY minute;

Separate Clusters for Different Workloads

-- Cluster for data ingestion
CREATE CLUSTER ingestion (
  SIZE = 'M.1-medium',
  REPLICATION FACTOR = 2
);

-- Cluster for analytics queries  
CREATE CLUSTER analytics (
  SIZE = 'M.1-xlarge',
  REPLICATION FACTOR = 1
);

-- Cluster for serving APIs
CREATE CLUSTER api (
  SIZE = 'M.1-large',
  REPLICATION FACTOR = 3
);

Scheduled Cluster for Reports

CREATE CLUSTER nightly_reports (
  SIZE = 'M.1-xlarge',
  SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '30 minutes')
);

CREATE MATERIALIZED VIEW daily_summary
  IN CLUSTER nightly_reports
  WITH (
    REFRESH AT CREATION,
    REFRESH EVERY '1 day' ALIGNED TO '2024-01-01 02:00:00'
  )
  AS
    SELECT
      date_trunc('day', order_date) AS day,
      COUNT(*) AS order_count,
      SUM(amount) AS total_revenue
    FROM orders
    GROUP BY day;

Cluster Sizing Guide

When to Size Up

  • Queries are slow
  • High memory usage (check system catalogs)
  • Dataflow hydration takes too long
  • Cluster can’t keep up with data ingestion

When to Add Replicas

  • Need fault tolerance
  • Availability is critical
  • Can tolerate failures
Adding replicas does NOT increase query performance or throughput. It only increases fault tolerance.

When to Create Multiple Clusters

  • Isolate workloads (dev/staging/prod)
  • Separate ingestion from queries
  • Different SLAs for different users
  • Cost optimization (pause unused clusters)

Credit Usage

Clusters consume credits based on size and replicas:
Credits/hour = Size credits × Replication factor
Examples:
  • M.1-small (1 credit/hr) × 2 replicas = 2 credits/hr
  • M.1-large (4 credits/hr) × 3 replicas = 12 credits/hr
  • Paused cluster (0 replicas) = 0 credits/hr
Credit usage is metered per second.

Managing Clusters

Resize a Cluster

ALTER CLUSTER prod SET (SIZE = 'M.1-xlarge');

Change Replication Factor

ALTER CLUSTER prod SET (REPLICATION FACTOR = 3);

Drop a Cluster

DROP CLUSTER dev;
Dropping a cluster drops all indexes hosted on it. Materialized views are not dropped.

Best Practices

  1. Start Small: Begin with small clusters and size up as needed
    CREATE CLUSTER test (SIZE = 'M.1-xsmall', REPLICATION FACTOR = 1);
    
  2. Isolate Workloads: Use separate clusters for different purposes
    CREATE CLUSTER ingestion (...);
    CREATE CLUSTER analytics (...);
    
  3. Pause When Idle: Set replication factor to 0 for unused clusters
    ALTER CLUSTER dev SET (REPLICATION FACTOR = 0);
    
  4. Monitor Usage: Check cluster utilization regularly
    SELECT * FROM mz_internal.mz_cluster_replica_utilization;
    
  5. Use Scheduled Clusters: For periodic reporting workloads
  6. Plan for Growth: Leave headroom for data volume increases

Build docs developers (and LLMs) love