Data Version Control for Machine Learning

Version your data and models, build reproducible ML pipelines, and track experiments — all with Git-like commands you already know.

Get Started Command Reference

Quick Start

Get up and running with DVC in minutes

Install DVC

Install DVC using your preferred package manager:

pip

pip install dvc

brew

brew install dvc

conda

conda install -c conda-forge dvc

Initialize a DVC repository

Navigate to your Git repository and initialize DVC:

cd your-ml-project
git init
dvc init
git commit -m "Initialize DVC"

This creates a .dvc directory with DVC configuration files.

Track your first data file

Start versioning your data with DVC:

dvc add data/dataset.csv
git add data/dataset.csv.dvc data/.gitignore
git commit -m "Add dataset to DVC"

DVC creates a .dvc file that tracks your data, while the actual data is stored in the DVC cache.

Set up remote storage

Configure remote storage to share data with your team:

dvc remote add -d myremote s3://mybucket/dvcstore
dvc push

Now your data is backed up and can be accessed by anyone on your team.

Explore by topic

Learn about DVC’s core features and capabilities

Data versioning

Store and version your datasets and models alongside your Git repository

ML pipelines

Build reproducible workflows that connect data, code, and models

Experiment tracking

Run, compare, and manage hundreds of ML experiments locally

Remote storage

Back up data to S3, GCS, Azure, SSH, and more cloud providers

User guides

Practical guides for common DVC workflows

Track data

Learn how to version datasets and models with DVC

Build pipelines

Create reproducible ML pipelines with stages

Run experiments

Track hyperparameters and compare experiment results

Remote storage

Set up cloud storage for data and model sharing

Collaboration

Work with your team using Git and DVC together

Python API

Access DVC functionality programmatically

Command reference

Complete reference for all DVC commands

Data management

Commands for tracking, pushing, pulling, and managing data files

Explore commands

Pipeline commands

Build and run reproducible ML pipelines and workflows

Explore commands

Experiment commands

Track, compare, and manage ML experiments locally

Explore commands

Metrics & parameters

Track and compare hyperparameters, metrics, and plots

Explore commands

Ready to get started?

Start versioning your data and models, building reproducible ML pipelines, and tracking experiments with DVC today.

Install DVC View on GitHub

Get Started

Core Concepts

User Guide

Configuration

Data Version Control for Machine Learning

Quick Start

Explore by topic

Data versioning

ML pipelines

Experiment tracking

Remote storage

User guides

Track data

Build pipelines

Run experiments

Remote storage

Collaboration

Python API

Command reference

Data management

Pipeline commands

Experiment commands

Metrics & parameters

Ready to get started?