Skip to main content

What is metaflow-dagster?

metaflow-dagster generates a self-contained Dagster definitions file from any Metaflow flow, letting you schedule, monitor, and launch your pipelines through Dagster while keeping all your existing Metaflow code unchanged.

Why metaflow-dagster?

Metaflow and Dagster are both powerful data workflow tools, but they serve different purposes. Metaflow excels at writing data science workflows with excellent local development and artifact management, while Dagster provides a robust orchestration UI, scheduling, and observability features. metaflow-dagster bridges these two worlds:

Keep Your Metaflow Code

No changes needed to your existing Metaflow flows. Write and test locally as you always have.

Dagster UI & Scheduling

Get Dagster’s powerful web UI, run history, scheduling, and monitoring capabilities.

All Graph Shapes Supported

Linear, branching, conditional, and foreach flows all work seamlessly.

Full Feature Support

Parameters, retries, timeouts, environment variables, and step decorators just work.

How It Works

metaflow-dagster compiles your Metaflow flow’s DAG into a self-contained Dagster definitions file. Each Metaflow step becomes a Dagster @op. The generated file:
  • Runs each step as a subprocess via the standard metaflow step CLI
  • Passes --input-paths correctly for joins and foreach splits
  • Emits Metaflow artifact keys and a retrieval snippet to the Dagster UI after each step
  • Forwards @resources hints to the compute backend via --with=resources:...
  • Emits SensorDefinitions for @trigger and @trigger_on_finish decorators
The compiled DAG is fully visible in Dagster — typed inputs, fan-out branches, and fan-in joins are all represented correctly in the UI.

Quick Example

Here’s a simple Metaflow flow:
from metaflow import FlowSpec, step

class LinearFlow(FlowSpec):
    @step
    def start(self):
        self.message = "hello from start"
        self.next(self.process)

    @step
    def process(self):
        self.result = self.message + " -> process"
        self.next(self.end)

    @step
    def end(self):
        print("Flow completed:", self.result)

if __name__ == "__main__":
    LinearFlow()
Deploy it to Dagster with two commands:
python linear_flow.py dagster create dagster_defs.py
dagster dev -f dagster_defs.py
That’s it! Your flow is now running in Dagster with full UI support.

Key Features

All Graph Shapes Supported

  • Linear: Simple start → process → end flows
  • Split/Join: Static branches that run in parallel
  • Conditional: Dynamic branches where only one path runs at runtime
  • Foreach: Fan-out patterns for parallel processing

Parameters & Configuration

Metaflow Parameter definitions are automatically converted to typed Dagster Config classes, giving you a type-safe configuration UI in the Dagster launchpad.

Retries & Timeouts

@retry and @timeout decorators on any step are picked up automatically. The generated op gets a Dagster RetryPolicy and an op_execution_timeout tag — no extra configuration needed.

Step Decorators

Inject Metaflow step decorators at deploy time without modifying the flow source using the --with flag:
python my_flow.py dagster create my_flow_dagster.py \
  --with=sandbox \
  --with='resources:cpu=4,memory=8000'

Event-Driven Sensors

Decorate your flow with @trigger or @trigger_on_finish to emit a SensorDefinition in the generated file automatically.

Next Steps

Installation

Install metaflow-dagster and verify your setup

Quickstart

Create and deploy your first flow in minutes