Skip to main content

Synopsis

dvc stage <subcommand>

Description

Stages are the building blocks of DVC pipelines. Each stage represents a data processing step with defined inputs (dependencies), outputs, and a command to execute. The dvc stage command provides subcommands to create and list stages in your project. Stages are typically defined in dvc.yaml files and can track:
  • Dependencies: Input files, directories, or parameters that the stage depends on
  • Outputs: Files or directories produced by the stage
  • Metrics: Numeric outputs for model evaluation
  • Plots: Data files for visualization
  • Commands: Shell commands that process data

Subcommands

dvc stage add

Create a new stage in dvc.yaml.
dvc stage add -n <name> [options] command
-n, --name
string
required
Name of the stage to add. This will be used to reference the stage in dvc.yaml and in other DVC commands.
command
string
required
Command to execute for this stage. Can be any shell command.

Options

-d, --deps
path
Declare dependencies for reproducible command. Can be specified multiple times for multiple dependencies. These files will be tracked, and the stage will run when they change.Example: -d data.csv -d config.json
-p, --params
string
Declare parameter to use as additional dependency. Format: [<filename>:]<params_list>.Parameters are read from params.yaml by default, or from a specified file.Example: -p train.epochs,train.lr -p config.yaml:model.layers
-o, --outs
path
Declare output file or directory. Outputs will be cached by DVC and tracked in dvc.lock.Example: -o model.pkl -o results/
-O, --outs-no-cache
path
Declare output file or directory that will NOT be put into DVC cache. Useful for large intermediate files or logs.Example: -O logs/training.log
-m, --metrics
path
Declare output metrics file. Metrics are tracked as outputs but are optimized for displaying numeric values.Example: -m metrics.json
-M, --metrics-no-cache
path
Declare output metrics file (do not put into DVC cache).
--plots
path
Declare output plot file. Plots are special outputs used for visualization.Example: --plots plots/accuracy.csv
--plots-no-cache
path
Declare output plot file (do not put into DVC cache).
--outs-persist
path
Declare output file or directory that will NOT be removed upon reproduction. Useful for checkpoint files.
--outs-persist-no-cache
path
Declare output file or directory that will not be removed upon repro (do not put into DVC cache).
-f, --force
boolean
Overwrite existing stage with the same name.
-w, --wdir
path
Directory within your repo to run your command in. The command will be executed from this working directory.Example: --wdir src/
--always-changed
boolean
Always consider this stage as changed. The stage will always run during reproduction, regardless of whether dependencies changed.
--desc
string
User description of the stage (optional). This doesn’t affect any DVC operations but helps document your pipeline.Example: --desc "Train neural network model"
--run
boolean
Execute the stage immediately after generating it.

dvc stage list

List stages from one or more pipelines.
dvc stage list [options] [targets...]
targets
path[]
Show stages from a dvc.yaml/.dvc file or a directory. Defaults to dvc.yaml in the current directory.Example: dvc stage list pipelines/train.dvc.yaml

Options

--all
boolean
List all of the stages in the repository.
-R, --recursive
boolean
List all stages inside the specified directory recursively.
--name-only
boolean
List only stage names without descriptions.
--fail
boolean
Fail immediately, do not suppress any syntax errors in stage files.

Examples

Creating a simple data preparation stage

dvc stage add -n prepare \
  -d raw_data.csv \
  -o data/prepared.csv \
  python prepare.py
This creates a stage named prepare in dvc.yaml:
stages:
  prepare:
    cmd: python prepare.py
    deps:
      - raw_data.csv
    outs:
      - data/prepared.csv

Creating a training stage with parameters and metrics

dvc stage add -n train \
  -d data/prepared.csv \
  -d src/train.py \
  -p train.epochs,train.lr \
  -o models/model.pkl \
  -m metrics/train.json \
  --desc "Train ML model with hyperparameters" \
  python src/train.py
The -p flag tells DVC to track specific parameters from params.yaml. The stage will re-run if these parameter values change.
Resulting dvc.yaml:
stages:
  train:
    cmd: python src/train.py
    deps:
      - data/prepared.csv
      - src/train.py
    params:
      - train.epochs
      - train.lr
    outs:
      - models/model.pkl
    metrics:
      - metrics/train.json:
          cache: false
    desc: Train ML model with hyperparameters

Creating a stage with multiple outputs

dvc stage add -n evaluate \
  -d models/model.pkl \
  -d data/test.csv \
  -m metrics/scores.json \
  --plots plots/confusion_matrix.csv \
  --plots plots/roc_curve.csv \
  -O logs/eval.log \
  python evaluate.py
The -O flag (uppercase) creates outputs that aren’t cached. This is useful for log files or intermediate results you don’t need to version.

Listing all stages in current pipeline

dvc stage list
Output:
prepare    Outputs data/prepared.csv
train      Outputs models/model.pkl; Reports metrics/train.json
evaluate   Reports metrics/scores.json, plots/confusion_matrix.csv, plots/roc_curve.csv

Listing stages with names only

dvc stage list --name-only
Output:
prepare
train
evaluate

Listing all stages recursively

dvc stage list -R pipelines/
This lists all stages in all dvc.yaml files within the pipelines/ directory.

Creating a stage with custom working directory

dvc stage add -n preprocess \
  -d ../data/raw.csv \
  -o processed.csv \
  -w preprocessing \
  python process.py
The command python process.py will be executed from the preprocessing/ directory. All paths in dependencies and outputs are relative to the repository root, not the working directory.

Overwriting an existing stage

dvc stage add -n train -f \
  -d data/prepared.csv \
  -p train.epochs \
  -o models/model.pkl \
  python train.py
The -f flag allows you to replace an existing stage without error.

Creating and running a stage immediately

dvc stage add -n transform \
  -d input.csv \
  -o output.csv \
  --run \
  python transform.py
Using --run will execute the stage command immediately. Make sure all dependencies are available and the command is correct.

Notes

Migration from dvc run: The dvc stage add command replaces the deprecated dvc run command. All functionality from dvc run is available in dvc stage add.
Stage addressing: Stages can be referenced by name (e.g., train) or by file path with colon syntax (e.g., pipelines/ml.dvc.yaml:train).
Pipeline files: Stages are defined in dvc.yaml files. When you run stages, DVC generates dvc.lock files that capture the exact state of dependencies and outputs, ensuring reproducibility.

See Also

Build docs developers (and LLMs) love