Synopsis
Description
Thedvc repro command reproduces (executes) stages in your DVC pipeline. It’s the primary way to run your ML workflows after defining stages with dvc stage add.
DVC automatically determines which stages need to be run by:
- Checking if dependencies have changed
- Checking if outputs are missing
- Checking if stage commands have changed
- Checking if parameters have changed
Smart execution: DVC uses checksums to detect changes and only runs stages when necessary. This is similar to how
make works but optimized for data pipelines.Arguments
Stages to reproduce. Defaults to
dvc.yaml in the current directory.Targets can be:- Path to a
dvc.yamlor.dvcfile - Stage name from
dvc.yamlin current directory - Path with stage name:
path/to/dvc.yaml:stage_name
Options
Execution Control
Reproduce even if dependencies were not changed. Forces execution of specified stages regardless of whether DVC detects changes.
Only print the commands that would be executed without actually executing them. Useful for previewing what will run.Output:
Ask for confirmation before reproducing each stage. DVC will prompt you before executing each stage command.
Pipeline Selection
Reproduce only single data item without recursive dependencies check. Runs only the specified stage(s) without checking or running dependencies.
Reproduce the whole pipeline that the specified targets belong to. Executes all stages from the beginning of the pipeline.
Reproduce all pipelines in the repository. Useful for ensuring entire project is up to date.
Reproduce all stages in the specified directory recursively. Finds all
dvc.yaml files in subdirectories.Start from the specified stages when reproducing pipelines. Runs the specified stage and all stages that depend on it.
Reproduce all descendants of a changed stage even if their direct dependencies didn’t change.Useful when you want to ensure all downstream stages are updated after modifying a stage.
Data Management
Try automatically pulling missing data before reproduction. If dependencies are missing, DVC attempts to download them from remote storage.
Skip stages with missing data but no other changes. Continues execution even if some dependencies are unavailable.
Don’t put files/directories into cache. Runs stages but doesn’t cache outputs.
Advanced Options
Execute stage commands even if they have already been run with the same command/dependencies/outputs/etc before.DVC maintains a run cache to avoid re-executing identical commands. This flag disables that optimization.
Allows targets containing shell-style wildcards.
Error Handling
Continue executing, skipping stages having dependencies on the failed stages. If a stage fails, DVC continues with independent stages.
Ignore errors from stages. Pipeline execution continues even when stages fail.
Examples
Basic reproduction
Reproduce all stages in the defaultdvc.yaml:
If no stages need to run, DVC will output: “Data and pipelines are up to date.”
Reproduce specific stage
train stage and any of its dependencies that have changed.
Force reproduction
Run a stage even if DVC thinks it’s up to date:Dry run to preview execution
Interactive reproduction
Reproduce entire pipeline
Even if you specify a single stage, reproduce from the beginning:Reproduce all pipelines in project
dvc.yaml files in your repository.
Reproduce with automatic data pull
Reproduce downstream stages
Run a stage and everything that depends on it:prepare produces data used by train and evaluate, all three will run.
Continue on failure
Reproduce without caching
Complex example: Force reproduce with downstream
prepare to run, then forces all downstream stages (train, evaluate, etc.) to run regardless of whether their direct dependencies changed.
Working with dvc.lock
When you rundvc repro, DVC updates dvc.lock to record:
- Checksums of dependencies
- Checksums of outputs
- Parameter values used
- Commands executed
Always commit
dvc.lock to Git. It ensures reproducibility by capturing the exact state of your pipeline.Understanding Stage Execution
When does a stage run?
A stage is executed if:- Dependencies changed: Input files have different checksums
- Outputs missing: Output files don’t exist or are missing from cache
- Command changed: The stage command was modified in
dvc.yaml - Parameters changed: Tracked parameters have different values
- Forced execution: You used
-for--force
Execution order
DVC analyzes the dependency graph and executes stages in topological order:Common Workflows
Development workflow
Reproducing on a different machine
Debugging pipeline issues
Updating after parameter changes
When you modifyparams.yaml:
Performance Tips
Troubleshooting
Pipeline appears up to date but shouldn’t be
Use-f to force execution:
Missing dependencies error
Try pulling data first:Stage keeps running unnecessarily
Check if files are being modified by the command:--outs-persist for outputs that shouldn’t be removed between runs.
Lock file conflicts
If you have merge conflicts indvc.lock:
See Also
- dvc stage add - Create pipeline stages
- dvc dag - Visualize pipeline structure
- dvc push - Upload outputs to remote storage
- dvc pull - Download outputs from remote storage
- dvc status - Show pipeline status