Synopsis
Description
Stages are the building blocks of DVC pipelines. Each stage represents a data processing step with defined inputs (dependencies), outputs, and a command to execute. Thedvc stage command provides subcommands to create and list stages in your project.
Stages are typically defined in dvc.yaml files and can track:
- Dependencies: Input files, directories, or parameters that the stage depends on
- Outputs: Files or directories produced by the stage
- Metrics: Numeric outputs for model evaluation
- Plots: Data files for visualization
- Commands: Shell commands that process data
Subcommands
dvc stage add
Create a new stage indvc.yaml.
Name of the stage to add. This will be used to reference the stage in
dvc.yaml and in other DVC commands.Command to execute for this stage. Can be any shell command.
Options
Declare dependencies for reproducible command. Can be specified multiple times for multiple dependencies. These files will be tracked, and the stage will run when they change.Example:
-d data.csv -d config.jsonDeclare parameter to use as additional dependency. Format:
[<filename>:]<params_list>.Parameters are read from params.yaml by default, or from a specified file.Example: -p train.epochs,train.lr -p config.yaml:model.layersDeclare output file or directory. Outputs will be cached by DVC and tracked in
dvc.lock.Example: -o model.pkl -o results/Declare output file or directory that will NOT be put into DVC cache. Useful for large intermediate files or logs.Example:
-O logs/training.logDeclare output metrics file. Metrics are tracked as outputs but are optimized for displaying numeric values.Example:
-m metrics.jsonDeclare output metrics file (do not put into DVC cache).
Declare output plot file. Plots are special outputs used for visualization.Example:
--plots plots/accuracy.csvDeclare output plot file (do not put into DVC cache).
Declare output file or directory that will NOT be removed upon reproduction. Useful for checkpoint files.
Declare output file or directory that will not be removed upon repro (do not put into DVC cache).
Overwrite existing stage with the same name.
Directory within your repo to run your command in. The command will be executed from this working directory.Example:
--wdir src/Always consider this stage as changed. The stage will always run during reproduction, regardless of whether dependencies changed.
User description of the stage (optional). This doesn’t affect any DVC operations but helps document your pipeline.Example:
--desc "Train neural network model"Execute the stage immediately after generating it.
dvc stage list
List stages from one or more pipelines.Show stages from a
dvc.yaml/.dvc file or a directory. Defaults to dvc.yaml in the current directory.Example: dvc stage list pipelines/train.dvc.yamlOptions
List all of the stages in the repository.
List all stages inside the specified directory recursively.
List only stage names without descriptions.
Fail immediately, do not suppress any syntax errors in stage files.
Examples
Creating a simple data preparation stage
prepare in dvc.yaml:
Creating a training stage with parameters and metrics
dvc.yaml:
Creating a stage with multiple outputs
The
-O flag (uppercase) creates outputs that aren’t cached. This is useful for log files or intermediate results you don’t need to version.Listing all stages in current pipeline
Listing stages with names only
Listing all stages recursively
dvc.yaml files within the pipelines/ directory.
Creating a stage with custom working directory
The command
python process.py will be executed from the preprocessing/ directory. All paths in dependencies and outputs are relative to the repository root, not the working directory.Overwriting an existing stage
-f flag allows you to replace an existing stage without error.
Creating and running a stage immediately
Notes
Migration from
dvc run: The dvc stage add command replaces the deprecated dvc run command. All functionality from dvc run is available in dvc stage add.Pipeline files: Stages are defined in
dvc.yaml files. When you run stages, DVC generates dvc.lock files that capture the exact state of dependencies and outputs, ensuring reproducibility.See Also
- dvc repro - Reproduce stages or pipelines
- dvc dag - Visualize pipeline structure
- dvc params - Commands to display parameters