dvc add

Synopsis

dvc add [options] <targets>...

Description

The dvc add command is used to start tracking data files or directories with DVC. When you add a file or directory, DVC:

Computes the hash of the file/directory contents
Moves the data to the DVC cache (unless --no-commit is used)
Creates a .dvc file that references the cached data
Adds the original file to .gitignore (so Git doesn’t track it)

This is the primary way to start versioning your data with DVC. The .dvc files should be committed to Git, while the actual data files remain in your workspace but are linked to the cache.

DVC uses file links (reflinks, hardlinks, or symlinks depending on your system) to avoid duplicating data between the cache and workspace.

Options

targets

path

required

Input files or directories to add. You can specify multiple targets separated by spaces.

dvc add data/raw.csv models/model.pkl

--no-commit

boolean

default:"false"

Don’t put files/directories into cache. Only creates the .dvc file without moving data to the cache.

Useful when you want to create the tracking structure but defer the actual caching operation.

--glob

boolean

default:"false"

Allows targets containing shell-style wildcards (e.g., *.csv, data/**/*.txt).

dvc add --glob "data/*.csv"

-o, --out

path

Destination path to put files to. This option changes where the output file is created.

dvc add data.csv --out processed/data.csv

Cannot be used with multiple targets or with --glob.

--to-remote

boolean

default:"false"

Download it directly to the remote storage instead of to the local cache.

This is useful for handling large files that don’t fit in your local cache. The file is tracked by DVC but stored only in remote storage.

-r, --remote

string

Remote storage to download to. Only used with --to-remote.

dvc add large-file.bin --to-remote --remote myremote

--remote-jobs

integer

default:"4 * cpu_count()"

Number of jobs to run simultaneously when pushing data to remote. Only used with --to-remote.

-f, --force

boolean

default:"false"

Override local file or folder if it exists.

--no-relink

boolean

default:"false"

Don’t recreate links from cache to workspace after adding.

Examples

Basic usage

Track a single data file:

dvc add data/raw.csv

Adding...
100% Adding...|████████████████████████████████|1/1 [00:00,  1.23file/s]

This creates data/raw.csv.dvc and adds data/raw.csv to .gitignore.

Track a directory

Track an entire directory of data:

dvc add data/images/

Adding...
100% Adding...|████████████████████████████████|1/1 [00:03,  3.45s/file]

Creates data/images.dvc that tracks all files in the directory.

Track multiple files

Add multiple files at once:

dvc add data/train.csv data/test.csv models/baseline.pkl

Adding...
100% Adding...|████████████████████████████████|3/3 [00:01,  2.15file/s]

Using wildcards

Track all CSV files in a directory:

dvc add --glob "data/*.csv"

Add without committing to cache

Create .dvc file without moving data to cache:

dvc add --no-commit large-dataset/

You can commit the data later with dvc commit.

Add directly to remote storage

For very large files, add directly to remote storage:

dvc add huge-file.bin --to-remote --remote s3storage

This bypasses the local cache, so the file won’t be available locally unless you run dvc pull.

Example workflow

A typical workflow when adding data:

# Add your data file
dvc add data/raw.csv

# Commit the .dvc file to Git
git add data/raw.csv.dvc data/.gitignore
git commit -m "Add raw dataset"

# Push data to remote storage
dvc push

# Push Git commits
git push

dvc commit - Record changes to tracked files
dvc push - Upload tracked files to remote storage
dvc checkout - Checkout data files from cache
dvc remove - Stop tracking files/directories

Overview

Data Management

Pipeline Commands

Experiment Commands

Metrics & Params

Remote Storage

Other Commands

Synopsis

Description

Options

Examples

Basic usage

Track a directory

Track multiple files

Using wildcards

Add without committing to cache

Add directly to remote storage

Example workflow

Build docs developers (and LLMs) love

Overview

Data Management

Pipeline Commands

Experiment Commands

Metrics & Params

Remote Storage

Other Commands

​Synopsis

​Description

​Options

​Examples

​Basic usage

​Track a directory

​Track multiple files

​Using wildcards

​Add without committing to cache

​Add directly to remote storage

​Example workflow

​Related commands

Build docs developers (and LLMs) love

Synopsis

Description

Options

Examples

Basic usage

Track a directory

Track multiple files

Using wildcards

Add without committing to cache

Add directly to remote storage

Example workflow

Related commands