Synopsis
Description
Thedvc commit command records changes to files or directories tracked by DVC by storing the current versions in the cache. It updates the hash values in .dvc files or dvc.lock to match the current workspace state.
Use dvc commit when:
- You’ve modified tracked data files and want to save the new version
- You used
dvc add --no-commitand now want to commit the data to cache - You’ve updated outputs of a DVC pipeline stage and want to persist them
- You need to update
.dvcfile metadata to reflect current file state
- Computes hashes of the current workspace files
- Moves/copies the files into the DVC cache
- Updates
.dvcordvc.lockfiles with new hash values - Creates links from cache back to workspace
Unlike Git’s commit,
dvc commit doesn’t create a new version history entry. It updates the current tracking state. Version history is maintained through Git commits of the .dvc files.Options
Limit command scope to specific tracked files/directories,
.dvc files, or stage names. If not specified, commits all changed tracked data.Commit data even if hash values for dependencies or outputs did not change. Forces a recommit of the data.
Commit all dependencies of the specified target. Useful for pipeline stages that depend on other stages.
Commit cache for subdirectories of the specified directory.
Don’t recreate links from cache to workspace after committing.
By default, after committing files to cache, DVC recreates the workspace files as links to the cache to save space.
Examples
Basic commit
Commit all modified tracked files:Commit specific files
Commit only specific targets:Commit after modifying data
Typical workflow when updating tracked data:Commit after add —no-commit
When you used--no-commit during add:
Force commit
Force a commit even when DVC thinks nothing changed:Commit pipeline outputs
Commit outputs from a DVC pipeline stage:Recursive commit
Commit all.dvc files in a directory:
Example workflows
Workflow 1: Update tracked data
Workflow 2: Pipeline development
Workflow 3: Batch operations
Understanding the difference from Git
| Operation | Git Commit | DVC Commit |
|---|---|---|
| What it does | Creates a new commit in Git history | Updates hash in .dvc files and moves data to cache |
| Version control | Yes, creates history entry | No, just updates current state |
| What’s tracked | Text files, code, .dvc files | Large data files, models, datasets |
| Where data goes | Git repository | DVC cache (local or remote) |
Best practice: After
dvc commit, always git commit the updated .dvc files to create a version history entry.Handling changes
When you commit, DVC may prompt you to confirm changes:--force:
Performance considerations
Related commands
dvc add- Start tracking new filesdvc checkout- Update workspace from cachedvc push- Upload committed data to remote storagedvc status- Check which files have changed