dvc repro

Synopsis

dvc repro [options] [targets...]

Description

The dvc repro command reproduces (executes) stages in your DVC pipeline. It’s the primary way to run your ML workflows after defining stages with dvc stage add. DVC automatically determines which stages need to be run by:

Checking if dependencies have changed
Checking if outputs are missing
Checking if stage commands have changed
Checking if parameters have changed

Only stages that need updating are executed, making pipeline reproduction efficient. DVC respects the dependency graph and executes stages in the correct order.

Smart execution: DVC uses checksums to detect changes and only runs stages when necessary. This is similar to how make works but optimized for data pipelines.

Arguments

targets

string[]

Stages to reproduce. Defaults to dvc.yaml in the current directory.Targets can be:

Path to a dvc.yaml or .dvc file
Stage name from dvc.yaml in current directory
Path with stage name: path/to/dvc.yaml:stage_name

Examples:

dvc repro                    # Reproduce all stages in dvc.yaml
dvc repro train              # Reproduce specific stage
dvc repro ml/dvc.yaml:train  # Reproduce stage in specific file

Options

Execution Control

-f, --force

boolean

Reproduce even if dependencies were not changed. Forces execution of specified stages regardless of whether DVC detects changes.

dvc repro -f train

--dry

boolean

Only print the commands that would be executed without actually executing them. Useful for previewing what will run.

dvc repro --dry

Output:

Running stage 'prepare':
> python prepare.py
Running stage 'train':  
> python train.py

-i, --interactive

boolean

Ask for confirmation before reproducing each stage. DVC will prompt you before executing each stage command.

dvc repro -i

Pipeline Selection

-s, --single-item

boolean

Reproduce only single data item without recursive dependencies check. Runs only the specified stage(s) without checking or running dependencies.

dvc repro -s evaluate

Using -s may result in inconsistent outputs if dependencies have changed.

-p, --pipeline

boolean

Reproduce the whole pipeline that the specified targets belong to. Executes all stages from the beginning of the pipeline.

dvc repro -p train  # Runs entire pipeline including stages before train

-P, --all-pipelines

boolean

Reproduce all pipelines in the repository. Useful for ensuring entire project is up to date.

dvc repro -P

-R, --recursive

boolean

Reproduce all stages in the specified directory recursively. Finds all dvc.yaml files in subdirectories.

dvc repro -R pipelines/

--downstream

boolean

Start from the specified stages when reproducing pipelines. Runs the specified stage and all stages that depend on it.

dvc repro --downstream prepare

--force-downstream

boolean

Reproduce all descendants of a changed stage even if their direct dependencies didn’t change.Useful when you want to ensure all downstream stages are updated after modifying a stage.

dvc repro --force-downstream

Data Management

--pull

boolean

Try automatically pulling missing data before reproduction. If dependencies are missing, DVC attempts to download them from remote storage.

dvc repro --pull

--allow-missing

boolean

Skip stages with missing data but no other changes. Continues execution even if some dependencies are unavailable.

dvc repro --allow-missing

--no-commit

boolean

Don’t put files/directories into cache. Runs stages but doesn’t cache outputs.

dvc repro --no-commit

Useful for testing pipeline changes without polluting the cache.

Advanced Options

--no-run-cache

boolean

Execute stage commands even if they have already been run with the same command/dependencies/outputs/etc before.DVC maintains a run cache to avoid re-executing identical commands. This flag disables that optimization.

dvc repro --no-run-cache

--glob

boolean

Allows targets containing shell-style wildcards.

dvc repro --glob "**/train*"

Error Handling

-k, --keep-going

boolean

Continue executing, skipping stages having dependencies on the failed stages. If a stage fails, DVC continues with independent stages.

dvc repro -k

--ignore-errors

boolean

Ignore errors from stages. Pipeline execution continues even when stages fail.

dvc repro --ignore-errors

Use with caution. This can result in incomplete or incorrect outputs.

Examples

Basic reproduction

Reproduce all stages in the default dvc.yaml:

dvc repro

Output:

Running stage 'prepare':
> python prepare.py
Updating lock file 'dvc.lock'

Running stage 'train':
> python train.py  
Updating lock file 'dvc.lock'

Use `dvc push` to send your updates to remote storage.

If no stages need to run, DVC will output: “Data and pipelines are up to date.”

Reproduce specific stage

dvc repro train

This runs the train stage and any of its dependencies that have changed.

Force reproduction

Run a stage even if DVC thinks it’s up to date:

dvc repro -f train

Useful when you’ve made code changes that don’t affect tracked dependencies, or when debugging.

Dry run to preview execution

dvc repro --dry

Output:

Running stage 'prepare' with command:
    python prepare.py
Running stage 'featurize' with command:
    python featurize.py
Running stage 'train' with command:
    python train.py --config params.yaml

Interactive reproduction

dvc repro -i

Output:

Run stage 'prepare' with command 'python prepare.py'? [y/n] y
Running stage 'prepare'...

Run stage 'train' with command 'python train.py'? [y/n] n
Skipping stage 'train'.

Reproduce entire pipeline

Even if you specify a single stage, reproduce from the beginning:

dvc repro -p evaluate

This ensures all stages (prepare, train, evaluate) are run in order.

Reproduce all pipelines in project

dvc repro -P

This finds and reproduces all dvc.yaml files in your repository.

Reproduce with automatic data pull

dvc repro --pull

If any dependencies are missing, DVC tries to download them from remote storage before running.

Reproduce downstream stages

Run a stage and everything that depends on it:

dvc repro --downstream prepare

If prepare produces data used by train and evaluate, all three will run.

Continue on failure

dvc repro -k

If you have independent pipeline branches and one fails, others will continue.

Reproduce without caching

dvc repro --no-commit

Runs stages but doesn’t cache outputs. Useful during development.

Complex example: Force reproduce with downstream

dvc repro -f --force-downstream prepare

This forces prepare to run, then forces all downstream stages (train, evaluate, etc.) to run regardless of whether their direct dependencies changed.

Working with dvc.lock

When you run dvc repro, DVC updates dvc.lock to record:

Checksums of dependencies
Checksums of outputs
Parameter values used
Commands executed

Example dvc.lock:

schema: '2.0'
stages:
  train:
    cmd: python train.py
    deps:
    - path: data/prepared.csv
      md5: 9a0d8f5e13de2c60f8c0f0b6c5aef8e3
      size: 150000
    - path: src/train.py
      md5: 4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a
      size: 2048
    params:
      params.yaml:
        train.epochs: 10
        train.lr: 0.001
    outs:
    - path: models/model.pkl
      md5: 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d
      size: 1048576

Always commit dvc.lock to Git. It ensures reproducibility by capturing the exact state of your pipeline.

Understanding Stage Execution

When does a stage run?

A stage is executed if:

Dependencies changed: Input files have different checksums
Outputs missing: Output files don’t exist or are missing from cache
Command changed: The stage command was modified in dvc.yaml
Parameters changed: Tracked parameters have different values
Forced execution: You used -f or --force

Execution order

DVC analyzes the dependency graph and executes stages in topological order:

prepare → featurize → train → evaluate
                        ↓
                    train_baseline

Stages with no dependencies run first. Stages run only after their dependencies complete.

Common Workflows

Development workflow

# Make code changes
vim src/train.py

# Test without caching
dvc repro --no-commit train

# When satisfied, run with caching
dvc repro train

# Commit changes
git add dvc.yaml dvc.lock src/train.py
git commit -m "Update training script"

# Push outputs to remote storage
dvc push

Reproducing on a different machine

# Clone repository
git clone <repo-url>
cd <repo>

# Pull data from remote storage
dvc pull

# Reproduce pipeline
dvc repro

Debugging pipeline issues

# Preview what will run
dvc repro --dry

# Run interactively
dvc repro -i

# Run single stage without dependencies
dvc repro -s problematic_stage

Updating after parameter changes

When you modify params.yaml:

# Edit parameters
vim params.yaml

# Reproduce - only affected stages run
dvc repro

DVC automatically detects which stages depend on the changed parameters.

Performance Tips

Use run cache: DVC’s run cache prevents re-running identical commands. Keep it enabled unless you have a specific reason to disable it.

Incremental execution: DVC only runs what’s necessary. Structure your pipeline with granular stages to maximize cache hits.

Parallel execution: While dvc repro executes stages sequentially, independent pipeline branches can be run in parallel manually using job schedulers or multiple terminals.

Troubleshooting

Pipeline appears up to date but shouldn’t be

Use -f to force execution:

dvc repro -f

Missing dependencies error

Try pulling data first:

dvc repro --pull

Or allow missing dependencies:

dvc repro --allow-missing

Stage keeps running unnecessarily

Check if files are being modified by the command:

dvc repro --dry

Consider using --outs-persist for outputs that shouldn’t be removed between runs.

Lock file conflicts

If you have merge conflicts in dvc.lock:

# Resolve Git conflict
git checkout --theirs dvc.lock
# Or: git checkout --ours dvc.lock

# Reproduce to regenerate lock file
dvc repro

Overview

Data Management

Pipeline Commands

Experiment Commands

Metrics & Params

Remote Storage

Other Commands

​Synopsis

​Description

​Arguments

​Options

​Execution Control

​Pipeline Selection

​Data Management

​Advanced Options

​Error Handling

​Examples

​Basic reproduction

​Reproduce specific stage

​Force reproduction

​Dry run to preview execution

​Interactive reproduction

​Reproduce entire pipeline

​Reproduce all pipelines in project

​Reproduce with automatic data pull

​Reproduce downstream stages

​Continue on failure

​Reproduce without caching

​Complex example: Force reproduce with downstream

​Working with dvc.lock

​Understanding Stage Execution

​When does a stage run?

​Execution order

​Common Workflows

​Development workflow

​Reproducing on a different machine

​Debugging pipeline issues

​Updating after parameter changes

​Performance Tips

​Troubleshooting

​Pipeline appears up to date but shouldn’t be

​Missing dependencies error

​Stage keeps running unnecessarily

​Lock file conflicts

​See Also

Build docs developers (and LLMs) love

Synopsis

Description

Arguments

Options

Execution Control

Pipeline Selection

Data Management

Advanced Options

Error Handling

Examples

Basic reproduction

Reproduce specific stage

Force reproduction

Dry run to preview execution

Interactive reproduction

Reproduce entire pipeline

Reproduce all pipelines in project

Reproduce with automatic data pull

Reproduce downstream stages

Continue on failure

Reproduce without caching

Complex example: Force reproduce with downstream

Working with dvc.lock

Understanding Stage Execution

When does a stage run?

Execution order

Common Workflows

Development workflow

Reproducing on a different machine

Debugging pipeline issues

Updating after parameter changes

Performance Tips

Troubleshooting

Pipeline appears up to date but shouldn’t be

Missing dependencies error

Stage keeps running unnecessarily

Lock file conflicts

See Also