Data Version Control for Machine Learning
Version your data and models, build reproducible ML pipelines, and track experiments — all with Git-like commands you already know.
Quick Start
Get up and running with DVC in minutes
Initialize a DVC repository
Navigate to your Git repository and initialize DVC:This creates a
.dvc directory with DVC configuration files.Track your first data file
Start versioning your data with DVC:DVC creates a
.dvc file that tracks your data, while the actual data is stored in the DVC cache.Explore by topic
Learn about DVC’s core features and capabilities
Data versioning
Store and version your datasets and models alongside your Git repository
ML pipelines
Build reproducible workflows that connect data, code, and models
Experiment tracking
Run, compare, and manage hundreds of ML experiments locally
Remote storage
Back up data to S3, GCS, Azure, SSH, and more cloud providers
User guides
Practical guides for common DVC workflows
Track data
Learn how to version datasets and models with DVC
Build pipelines
Create reproducible ML pipelines with stages
Run experiments
Track hyperparameters and compare experiment results
Remote storage
Set up cloud storage for data and model sharing
Collaboration
Work with your team using Git and DVC together
Python API
Access DVC functionality programmatically
Command reference
Complete reference for all DVC commands
Data management
Commands for tracking, pushing, pulling, and managing data files
Explore commands
Pipeline commands
Build and run reproducible ML pipelines and workflows
Explore commands
Experiment commands
Track, compare, and manage ML experiments locally
Explore commands
Metrics & parameters
Track and compare hyperparameters, metrics, and plots
Explore commands
Ready to get started?
Start versioning your data and models, building reproducible ML pipelines, and tracking experiments with DVC today.