Skip to main content

Synopsis

dvc add [options] <targets>...

Description

The dvc add command is used to start tracking data files or directories with DVC. When you add a file or directory, DVC:
  1. Computes the hash of the file/directory contents
  2. Moves the data to the DVC cache (unless --no-commit is used)
  3. Creates a .dvc file that references the cached data
  4. Adds the original file to .gitignore (so Git doesn’t track it)
This is the primary way to start versioning your data with DVC. The .dvc files should be committed to Git, while the actual data files remain in your workspace but are linked to the cache.
DVC uses file links (reflinks, hardlinks, or symlinks depending on your system) to avoid duplicating data between the cache and workspace.

Options

targets
path
required
Input files or directories to add. You can specify multiple targets separated by spaces.
dvc add data/raw.csv models/model.pkl
--no-commit
boolean
default:"false"
Don’t put files/directories into cache. Only creates the .dvc file without moving data to the cache.
Useful when you want to create the tracking structure but defer the actual caching operation.
--glob
boolean
default:"false"
Allows targets containing shell-style wildcards (e.g., *.csv, data/**/*.txt).
dvc add --glob "data/*.csv"
-o, --out
path
Destination path to put files to. This option changes where the output file is created.
dvc add data.csv --out processed/data.csv
Cannot be used with multiple targets or with --glob.
--to-remote
boolean
default:"false"
Download it directly to the remote storage instead of to the local cache.
This is useful for handling large files that don’t fit in your local cache. The file is tracked by DVC but stored only in remote storage.
-r, --remote
string
Remote storage to download to. Only used with --to-remote.
dvc add large-file.bin --to-remote --remote myremote
--remote-jobs
integer
default:"4 * cpu_count()"
Number of jobs to run simultaneously when pushing data to remote. Only used with --to-remote.
-f, --force
boolean
default:"false"
Override local file or folder if it exists.
Don’t recreate links from cache to workspace after adding.

Examples

Basic usage

Track a single data file:
dvc add data/raw.csv
Adding...
100% Adding...|████████████████████████████████|1/1 [00:00,  1.23file/s]
This creates data/raw.csv.dvc and adds data/raw.csv to .gitignore.

Track a directory

Track an entire directory of data:
dvc add data/images/
Adding...
100% Adding...|████████████████████████████████|1/1 [00:03,  3.45s/file]
Creates data/images.dvc that tracks all files in the directory.

Track multiple files

Add multiple files at once:
dvc add data/train.csv data/test.csv models/baseline.pkl
Adding...
100% Adding...|████████████████████████████████|3/3 [00:01,  2.15file/s]

Using wildcards

Track all CSV files in a directory:
dvc add --glob "data/*.csv"

Add without committing to cache

Create .dvc file without moving data to cache:
dvc add --no-commit large-dataset/
You can commit the data later with dvc commit.

Add directly to remote storage

For very large files, add directly to remote storage:
dvc add huge-file.bin --to-remote --remote s3storage
This bypasses the local cache, so the file won’t be available locally unless you run dvc pull.

Example workflow

A typical workflow when adding data:
# Add your data file
dvc add data/raw.csv

# Commit the .dvc file to Git
git add data/raw.csv.dvc data/.gitignore
git commit -m "Add raw dataset"

# Push data to remote storage
dvc push

# Push Git commits
git push
  • dvc commit - Record changes to tracked files
  • dvc push - Upload tracked files to remote storage
  • dvc checkout - Checkout data files from cache
  • dvc remove - Stop tracking files/directories

Build docs developers (and LLMs) love