dvc commit

Synopsis

dvc commit [options] [<targets>...]

Description

The dvc commit command records changes to files or directories tracked by DVC by storing the current versions in the cache. It updates the hash values in .dvc files or dvc.lock to match the current workspace state. Use dvc commit when:

You’ve modified tracked data files and want to save the new version
You used dvc add --no-commit and now want to commit the data to cache
You’ve updated outputs of a DVC pipeline stage and want to persist them
You need to update .dvc file metadata to reflect current file state

The commit operation:

Computes hashes of the current workspace files
Moves/copies the files into the DVC cache
Updates .dvc or dvc.lock files with new hash values
Creates links from cache back to workspace

Unlike Git’s commit, dvc commit doesn’t create a new version history entry. It updates the current tracking state. Version history is maintained through Git commits of the .dvc files.

Options

targets

path

Limit command scope to specific tracked files/directories, .dvc files, or stage names. If not specified, commits all changed tracked data.

dvc commit data/raw.csv models/

-f, --force

boolean

default:"false"

Commit data even if hash values for dependencies or outputs did not change. Forces a recommit of the data.

dvc commit --force train.dvc

Use this when you want to ensure data is in cache even if DVC thinks nothing changed.

-d, --with-deps

boolean

default:"false"

Commit all dependencies of the specified target. Useful for pipeline stages that depend on other stages.

dvc commit --with-deps evaluate.dvc

-R, --recursive

boolean

default:"false"

Commit cache for subdirectories of the specified directory.

dvc commit --recursive data/

--no-relink

boolean

default:"false"

Don’t recreate links from cache to workspace after committing.

By default, after committing files to cache, DVC recreates the workspace files as links to the cache to save space.

Examples

Basic commit

Commit all modified tracked files:

dvc commit

Committing data/train.csv
Committing models/model.pkl

Commit specific files

Commit only specific targets:

dvc commit data/processed.csv

Commit after modifying data

Typical workflow when updating tracked data:

# Modify your data file
echo "new data" >> data/dataset.csv

# Commit the changes to DVC
dvc commit data/dataset.csv.dvc

# The .dvc file now has updated hashes
git add data/dataset.csv.dvc
git commit -m "Update dataset"

Commit after add —no-commit

When you used --no-commit during add:

# Add file without caching
dvc add --no-commit large-file.bin

# Later, commit it to cache
dvc commit large-file.bin.dvc

Force commit

Force a commit even when DVC thinks nothing changed:

dvc commit --force data/dataset.csv.dvc

This will recompute hashes and move data to cache even if the file appears unchanged.

Commit pipeline outputs

Commit outputs from a DVC pipeline stage:

# After running a pipeline manually or with modified code
dvc commit train.dvc

Recursive commit

Commit all .dvc files in a directory:

dvc commit --recursive experiments/

Example workflows

Workflow 1: Update tracked data

# 1. Modify your data
python process_data.py  # Updates data/processed.csv

# 2. Commit changes to DVC
dvc commit data/processed.csv.dvc

# 3. Commit the updated .dvc file to Git
git add data/processed.csv.dvc
git commit -m "Update processed data with new logic"

# 4. Push both Git and DVC changes
git push
dvc push

Workflow 2: Pipeline development

# 1. Modify your training script
vim train.py

# 2. Run the modified pipeline
dvc repro

# 3. If you modified outputs manually after repro
dvc commit train.dvc

# 4. Commit to Git
git add train.dvc dvc.lock
git commit -m "Update training pipeline"

Workflow 3: Batch operations

# Process multiple files
python batch_process.py  # Modifies multiple tracked files

# Commit all changes
dvc commit

# Review what changed
git diff *.dvc

# Commit to Git
git add *.dvc
git commit -m "Batch update all datasets"

Understanding the difference from Git

Operation	Git Commit	DVC Commit
What it does	Creates a new commit in Git history	Updates hash in `.dvc` files and moves data to cache
Version control	Yes, creates history entry	No, just updates current state
What’s tracked	Text files, code, `.dvc` files	Large data files, models, datasets
Where data goes	Git repository	DVC cache (local or remote)

Best practice: After dvc commit, always git commit the updated .dvc files to create a version history entry.

Handling changes

When you commit, DVC may prompt you to confirm changes:

dependencies ['data/raw.csv'] and outputs ['data/processed.csv'] of train.dvc changed. 
Are you sure you want to commit it? [y/n]

To skip the prompt, use --force:

dvc commit --force

Performance considerations

Commit specific targets - Instead of committing everything with dvc commit, specify targets to avoid unnecessary hash computations for unchanged files.

Use —no-relink for speed - If you don’t need workspace files updated, use --no-relink to skip the relinking step.

dvc add - Start tracking new files
dvc checkout - Update workspace from cache
dvc push - Upload committed data to remote storage
dvc status - Check which files have changed

Overview

Data Management

Pipeline Commands

Experiment Commands

Metrics & Params

Remote Storage

Other Commands

dvc commit

Synopsis

Description

Options

Examples

Basic commit

Commit specific files

Commit after modifying data

Commit after add —no-commit

Force commit

Commit pipeline outputs

Recursive commit

Example workflows

Workflow 1: Update tracked data

Workflow 2: Pipeline development

Workflow 3: Batch operations

Understanding the difference from Git

Handling changes

Performance considerations

Build docs developers (and LLMs) love

Overview

Data Management

Pipeline Commands

Experiment Commands

Metrics & Params

Remote Storage

Other Commands

​Synopsis

​Description

​Options

​Examples

​Basic commit

​Commit specific files

​Commit after modifying data

​Commit after add —no-commit

​Force commit

​Commit pipeline outputs

​Recursive commit

​Example workflows

​Workflow 1: Update tracked data

​Workflow 2: Pipeline development

​Workflow 3: Batch operations

​Understanding the difference from Git

​Handling changes

​Performance considerations

​Related commands

Build docs developers (and LLMs) love

Synopsis

Description

Options

Examples

Basic commit

Commit specific files

Commit after modifying data

Commit after add —no-commit

Force commit

Commit pipeline outputs

Recursive commit

Example workflows

Workflow 1: Update tracked data

Workflow 2: Pipeline development

Workflow 3: Batch operations

Understanding the difference from Git

Handling changes

Performance considerations

Related commands