Skip to main content

Synopsis

dvc push [options] [<targets>...]

Description

The dvc push command uploads DVC-tracked files and directories from your local cache to remote storage (such as S3, GCS, Azure, or SSH storage). This is analogous to git push but for your data files. It ensures that:
  • Your data is safely backed up in remote storage
  • Team members can access the data with dvc pull
  • CI/CD systems can fetch the necessary data
  • Different environments can sync the same data versions
dvc push only uploads files that don’t already exist in the remote storage, making it efficient for incremental updates.
You must configure a remote storage location before using dvc push. Use dvc remote add to set up a remote.

Options

targets
path
Limit command scope to specific tracked files/directories, .dvc files, or stage names. If not specified, pushes all tracked data.
dvc push data/train.csv models/model.pkl
-r, --remote
string
Remote storage to push to. If not specified, uses the default remote configured in .dvc/config.
dvc push --remote s3storage
-j, --jobs
integer
default:"4 * cpu_count()"
Number of jobs to run simultaneously. Higher values increase parallelism but use more resources.
dvc push --jobs 8
-a, --all-branches
boolean
default:"false"
Push cache for all Git branches. Useful for backing up all experiments.
dvc push --all-branches
This can upload a lot of data if you have many branches with different datasets.
-T, --all-tags
boolean
default:"false"
Push cache for all Git tags.
dvc push --all-tags
-A, --all-commits
boolean
default:"false"
Push cache for all Git commits.
This can be very slow and upload large amounts of data. Use with caution.
-d, --with-deps
boolean
default:"false"
Push cache for all dependencies of the specified target.
dvc push --with-deps train.dvc
-R, --recursive
boolean
default:"false"
Push cache for subdirectories of the specified directory.
dvc push --recursive experiments/
--run-cache
boolean
default:"false"
Push run history for all stages. This includes execution metadata and can help reproduce pipeline runs.
dvc push --run-cache
--glob
boolean
default:"false"
Allows targets containing shell-style wildcards.
dvc push --glob "data/*.csv"

Examples

Basic push

Push all tracked data to the default remote:
dvc push
Everything is up to date.
Or if there are files to push:
2 files pushed

Push specific files

Push only specific targets:
dvc push data/train.csv.dvc
1 file pushed

Push to specific remote

Push to a named remote:
dvc push --remote backup

Push with higher parallelism

Speed up push with more concurrent jobs:
dvc push --jobs 16

Push all branches

Backup data from all branches:
dvc push --all-branches
15 files pushed
This is useful for ensuring all experimental branches are backed up before cleanup.

Push with dependencies

Push a pipeline stage and all its dependencies:
dvc push --with-deps evaluate.dvc

Push with wildcards

dvc push --glob "experiments/exp-*/*.dvc"

Example workflows

Workflow 1: Regular development

# 1. Add or modify data
dvc add data/new_dataset.csv

# 2. Commit to Git
git add data/new_dataset.csv.dvc data/.gitignore
git commit -m "Add new dataset"

# 3. Push data to remote
dvc push

# 4. Push Git commits
git push
Always dvc push before git push to ensure data is backed up before code references are published.

Workflow 2: After running pipeline

# Run your pipeline
dvc repro

# Check what changed
dvc status --cloud

# Push new outputs
dvc push

# Commit pipeline changes
git add dvc.lock
git commit -m "Update pipeline outputs"
git push

Workflow 3: Backup all experiments

# Backup all branch data before cleanup
dvc push --all-branches

# Now safe to delete local branches
git branch -d old-experiment

# Clean local cache
dvc gc --workspace

Workflow 4: Team collaboration

# You: Update dataset
python update_data.py
dvc commit data/dataset.csv.dvc

# Push to remote
dvc push

# Commit and push to Git
git add data/dataset.csv.dvc
git commit -m "Update dataset with new samples"
git push

# Teammate: Pull changes
git pull
dvc pull

Setting up remotes

Before using dvc push, configure a remote:

S3

dvc remote add -d myremote s3://mybucket/path

Google Cloud Storage

dvc remote add -d myremote gs://mybucket/path

Azure Blob Storage

dvc remote add -d myremote azure://mycontainer/path

SSH/SFTP

dvc remote add -d myremote ssh://user@host/path

Local or Network Drive

dvc remote add -d myremote /mnt/shared/dvc-storage
Set as default:
dvc remote default myremote

Checking what needs to be pushed

Before pushing, check status:
dvc status --cloud
new:            data/train.csv
new:            models/model.pkl
This shows files in local cache that haven’t been pushed to remote.

Understanding push output

2 files pushed
Or if everything is synced:
Everything is up to date.
With multiple branches:
main:
        2 files pushed
experiment-1:
        3 files pushed

Error handling

No remote configured

ERROR: no remote provided and no default remote set
Solution: Add a remote storage location:
dvc remote add -d myremote <url>

Authentication errors

ERROR: failed to push data to the cloud
Solution: Configure credentials for your remote storage. Example for S3:
dvc remote modify myremote access_key_id YOUR_ACCESS_KEY
dvc remote modify myremote secret_access_key YOUR_SECRET_KEY

Network issues

If push fails due to network issues, simply run dvc push again. DVC will resume from where it left off.

Performance tips

Increase parallelism - Use --jobs to speed up uploads, especially for many small files:
dvc push --jobs 16
Push specific targets - Instead of pushing everything, push only what changed:
dvc status --cloud  # Check what needs pushing
dvc push data/changed_file.csv.dvc
Use cloud-native storage - For best performance, use storage in the same cloud region as your compute.

Best practices

  1. Always push before git push: Ensure data is backed up before publishing code
  2. Use —all-branches periodically: Backup experiment data before cleaning up branches
  3. Configure credentials securely: Use environment variables or IAM roles instead of storing credentials in config
  4. Monitor costs: Cloud storage and transfer costs can add up with large datasets
  • dvc pull - Download data from remote storage
  • dvc fetch - Download to cache without checking out
  • dvc status - Check sync status with remote
  • dvc remote - Manage remote storage locations

Build docs developers (and LLMs) love