dvc stage

Synopsis

dvc stage <subcommand>

Description

Stages are the building blocks of DVC pipelines. Each stage represents a data processing step with defined inputs (dependencies), outputs, and a command to execute. The dvc stage command provides subcommands to create and list stages in your project. Stages are typically defined in dvc.yaml files and can track:

Dependencies: Input files, directories, or parameters that the stage depends on
Outputs: Files or directories produced by the stage
Metrics: Numeric outputs for model evaluation
Plots: Data files for visualization
Commands: Shell commands that process data

Subcommands

dvc stage add

Create a new stage in dvc.yaml.

dvc stage add -n <name> [options] command

-n, --name

string

required

Name of the stage to add. This will be used to reference the stage in dvc.yaml and in other DVC commands.

command

string

required

Command to execute for this stage. Can be any shell command.

Options

-d, --deps

path

Declare dependencies for reproducible command. Can be specified multiple times for multiple dependencies. These files will be tracked, and the stage will run when they change.Example: -d data.csv -d config.json

-p, --params

string

Declare parameter to use as additional dependency. Format: [<filename>:]<params_list>.Parameters are read from params.yaml by default, or from a specified file.Example: -p train.epochs,train.lr -p config.yaml:model.layers

-o, --outs

path

Declare output file or directory. Outputs will be cached by DVC and tracked in dvc.lock.Example: -o model.pkl -o results/

-O, --outs-no-cache

path

Declare output file or directory that will NOT be put into DVC cache. Useful for large intermediate files or logs.Example: -O logs/training.log

-m, --metrics

path

Declare output metrics file. Metrics are tracked as outputs but are optimized for displaying numeric values.Example: -m metrics.json

-M, --metrics-no-cache

path

Declare output metrics file (do not put into DVC cache).

--plots

path

Declare output plot file. Plots are special outputs used for visualization.Example: --plots plots/accuracy.csv

--plots-no-cache

path

Declare output plot file (do not put into DVC cache).

--outs-persist

path

Declare output file or directory that will NOT be removed upon reproduction. Useful for checkpoint files.

--outs-persist-no-cache

path

Declare output file or directory that will not be removed upon repro (do not put into DVC cache).

-f, --force

boolean

Overwrite existing stage with the same name.

-w, --wdir

path

Directory within your repo to run your command in. The command will be executed from this working directory.Example: --wdir src/

--always-changed

boolean

Always consider this stage as changed. The stage will always run during reproduction, regardless of whether dependencies changed.

--desc

string

User description of the stage (optional). This doesn’t affect any DVC operations but helps document your pipeline.Example: --desc "Train neural network model"

--run

boolean

Execute the stage immediately after generating it.

dvc stage list

List stages from one or more pipelines.

dvc stage list [options] [targets...]

targets

path[]

Show stages from a dvc.yaml/.dvc file or a directory. Defaults to dvc.yaml in the current directory.Example: dvc stage list pipelines/train.dvc.yaml

Options

--all

boolean

List all of the stages in the repository.

-R, --recursive

boolean

List all stages inside the specified directory recursively.

--name-only

boolean

List only stage names without descriptions.

--fail

boolean

Fail immediately, do not suppress any syntax errors in stage files.

Examples

Creating a simple data preparation stage

dvc stage add -n prepare \
  -d raw_data.csv \
  -o data/prepared.csv \
  python prepare.py

This creates a stage named prepare in dvc.yaml:

stages:
  prepare:
    cmd: python prepare.py
    deps:
      - raw_data.csv
    outs:
      - data/prepared.csv

Creating a training stage with parameters and metrics

dvc stage add -n train \
  -d data/prepared.csv \
  -d src/train.py \
  -p train.epochs,train.lr \
  -o models/model.pkl \
  -m metrics/train.json \
  --desc "Train ML model with hyperparameters" \
  python src/train.py

The -p flag tells DVC to track specific parameters from params.yaml. The stage will re-run if these parameter values change.

Resulting dvc.yaml:

stages:
  train:
    cmd: python src/train.py
    deps:
      - data/prepared.csv
      - src/train.py
    params:
      - train.epochs
      - train.lr
    outs:
      - models/model.pkl
    metrics:
      - metrics/train.json:
          cache: false
    desc: Train ML model with hyperparameters

Creating a stage with multiple outputs

dvc stage add -n evaluate \
  -d models/model.pkl \
  -d data/test.csv \
  -m metrics/scores.json \
  --plots plots/confusion_matrix.csv \
  --plots plots/roc_curve.csv \
  -O logs/eval.log \
  python evaluate.py

The -O flag (uppercase) creates outputs that aren’t cached. This is useful for log files or intermediate results you don’t need to version.

Listing all stages in current pipeline

dvc stage list

Output:

prepare    Outputs data/prepared.csv
train      Outputs models/model.pkl; Reports metrics/train.json
evaluate   Reports metrics/scores.json, plots/confusion_matrix.csv, plots/roc_curve.csv

Listing stages with names only

dvc stage list --name-only

Output:

prepare
train
evaluate

Listing all stages recursively

dvc stage list -R pipelines/

This lists all stages in all dvc.yaml files within the pipelines/ directory.

Creating a stage with custom working directory

dvc stage add -n preprocess \
  -d ../data/raw.csv \
  -o processed.csv \
  -w preprocessing \
  python process.py

The command python process.py will be executed from the preprocessing/ directory. All paths in dependencies and outputs are relative to the repository root, not the working directory.

Overwriting an existing stage

dvc stage add -n train -f \
  -d data/prepared.csv \
  -p train.epochs \
  -o models/model.pkl \
  python train.py

The -f flag allows you to replace an existing stage without error.

Creating and running a stage immediately

dvc stage add -n transform \
  -d input.csv \
  -o output.csv \
  --run \
  python transform.py

Using --run will execute the stage command immediately. Make sure all dependencies are available and the command is correct.

Notes

Migration from dvc run: The dvc stage add command replaces the deprecated dvc run command. All functionality from dvc run is available in dvc stage add.

Stage addressing: Stages can be referenced by name (e.g., train) or by file path with colon syntax (e.g., pipelines/ml.dvc.yaml:train).

Pipeline files: Stages are defined in dvc.yaml files. When you run stages, DVC generates dvc.lock files that capture the exact state of dependencies and outputs, ensuring reproducibility.

Overview

Data Management

Pipeline Commands

Experiment Commands

Metrics & Params

Remote Storage

Other Commands

dvc stage

Synopsis

Description

Subcommands

dvc stage add

Options

dvc stage list

Options

Examples

Creating a simple data preparation stage

Creating a training stage with parameters and metrics

Creating a stage with multiple outputs

Listing all stages in current pipeline

Listing stages with names only

Listing all stages recursively

Creating a stage with custom working directory

Overwriting an existing stage

Creating and running a stage immediately

Notes

See Also

Build docs developers (and LLMs) love

Overview

Data Management

Pipeline Commands

Experiment Commands

Metrics & Params

Remote Storage

Other Commands

​Synopsis

​Description

​Subcommands

​dvc stage add

​Options

​dvc stage list

​Options

​Examples

​Creating a simple data preparation stage

​Creating a training stage with parameters and metrics

​Creating a stage with multiple outputs

​Listing all stages in current pipeline

​Listing stages with names only

​Listing all stages recursively

​Creating a stage with custom working directory

​Overwriting an existing stage

​Creating and running a stage immediately

​Notes

​See Also

Build docs developers (and LLMs) love

Synopsis

Description

Subcommands

dvc stage add

Options

dvc stage list

Options

Examples

Creating a simple data preparation stage

Creating a training stage with parameters and metrics

Creating a stage with multiple outputs

Listing all stages in current pipeline

Listing stages with names only

Listing all stages recursively

Creating a stage with custom working directory

Overwriting an existing stage

Creating and running a stage immediately

Notes

See Also