Skip to main content

Synopsis

dvc fetch [options] [<targets>...]

Description

The dvc fetch command downloads DVC-tracked files from remote storage to your local cache without updating your workspace. It’s one part of what dvc pull does (the other being dvc checkout). Use dvc fetch when you want to:
  • Pre-download data without immediately checking it out
  • Prepare cache for multiple branch checkouts
  • Download data for later use
  • Populate a shared cache location
  • Backup all remote data locally
Unlike dvc pull, which both downloads and updates workspace files, dvc fetch only populates the cache. To make the files available in your workspace, you need to run dvc checkout afterward.
Think of dvc fetch like git fetch - it downloads data but doesn’t change your working directory. Use dvc pull (like git pull) if you want to download and update your workspace in one step.

Options

targets
path
Limit command scope to specific tracked files/directories, .dvc files, or stage names. If not specified, fetches all tracked data.
dvc fetch data/train.csv models/
-r, --remote
string
Remote storage to fetch from. If not specified, uses the default remote configured in .dvc/config.
dvc fetch --remote s3storage
-j, --jobs
integer
default:"4 * cpu_count()"
Number of jobs to run simultaneously. Higher values increase parallelism but use more resources.
dvc fetch --jobs 16
-a, --all-branches
boolean
default:"false"
Fetch cache for all Git branches. Downloads data for every branch in the repository.
dvc fetch --all-branches
Useful for populating a shared cache or preparing for rapid branch switching.
-T, --all-tags
boolean
default:"false"
Fetch cache for all Git tags.
dvc fetch --all-tags
-A, --all-commits
boolean
default:"false"
Fetch cache for all Git commits.
This can download a massive amount of data. Use only when you need complete history.
-d, --with-deps
boolean
default:"false"
Fetch cache for all dependencies of the specified target.
dvc fetch --with-deps evaluate.dvc
-R, --recursive
boolean
default:"false"
Fetch cache for subdirectories of the specified directory.
dvc fetch --recursive experiments/
--run-cache
boolean
default:"false"
Fetch run history for all stages.
dvc fetch --run-cache
--max-size
integer
Fetch only files/directories that are each below specified size in bytes.
# Fetch only files smaller than 100MB
dvc fetch --max-size 104857600
Useful for CI/CD environments with limited storage or bandwidth.
--type
string[]
Only fetch data files/directories that are of a particular type. Can specify multiple times.Choices: metrics, plots
dvc fetch --type metrics --type plots

Examples

Basic fetch

Fetch all tracked data to local cache:
dvc fetch
3 files fetched
Or if cache is up to date:
Everything is up to date.

Fetch then checkout

The two-step equivalent of dvc pull:
# Download to cache
dvc fetch

# Update workspace
dvc checkout

Fetch specific files

Fetch only specific targets:
dvc fetch data/train.csv.dvc models/model.pkl.dvc
2 files fetched

Fetch from specific remote

dvc fetch --remote backup-storage

Fetch all branches

Download data for all branches (great for shared caches):
dvc fetch --all-branches
main:
        2 files fetched
experiment-1:
        3 files fetched
experiment-2:
        1 file fetched
        
Total: 6 files fetched

Fetch with dependencies

Fetch a pipeline stage and all its dependencies:
dvc fetch --with-deps train.dvc

Fetch small files only

Fetch only files under 50MB:
dvc fetch --max-size 52428800
Useful in CI/CD to skip large model files when only running unit tests.

Fetch only metrics and plots

dvc fetch --type metrics --type plots

Parallel fetch

Speed up with more jobs:
dvc fetch --jobs 16

Example workflows

Workflow 1: Shared cache setup

Set up a shared cache for your team:
# On shared server/machine
cd /shared/ml-project

# Fetch all data for all branches
dvc fetch --all-branches

# Configure team members to use this cache
# In each member's workspace:
dvc cache dir /shared/ml-project/.dvc/cache

Workflow 2: Branch switching optimization

Pre-fetch data for branches you’ll be working on:
# Fetch data for multiple branches
dvc fetch --all-branches

# Now you can switch branches quickly
git checkout experiment-1
dvc checkout  # Fast - already in cache

git checkout experiment-2
dvc checkout  # Also fast

Workflow 3: CI/CD with selective fetch

#!/bin/bash
# ci-test.sh

# Clone repo
git clone $REPO_URL
cd project

# Fetch only small test files
dvc fetch --type metrics data/test-sample.csv.dvc

# Checkout to workspace
dvc checkout

# Run tests
pytest tests/

Workflow 4: Disaster recovery

Backup remote storage to local:
# Fetch everything from remote
dvc fetch --all-branches --all-tags --run-cache

# Now local cache has complete backup
# Can re-upload to a different remote if needed
dvc remote add new-backup s3://backup-bucket/
dvc push --remote new-backup --all-branches --all-tags

Workflow 5: Prepare for offline work

# Before losing internet connection
dvc fetch --all-branches

# Later, offline:
git checkout feature-branch
dvc checkout  # Works - data in cache

git checkout main
dvc checkout  # Also works

Understanding fetch vs pull vs checkout

CommandDownloads from remoteUpdates workspaceUse case
dvc fetchPre-download data
dvc checkoutUpdate workspace from cache
dvc pullDownload and update

Visual flow

Remote Storage → [dvc fetch] → Local Cache → [dvc checkout] → Workspace

Remote Storage → [dvc pull] → Local Cache → Workspace
                               (does both)

Example comparison

Using fetch + checkout:

dvc fetch    # Downloads to cache
ls data/     # Files not yet in workspace
dvc checkout # Now files appear in workspace
ls data/     # Files now visible

Using pull:

dvc pull     # Downloads to cache AND workspace
ls data/     # Files immediately visible

When to use fetch instead of pull

Use dvc fetch when:
  • Setting up shared cache
  • Pre-downloading for multiple branches
  • Populating cache for CI/CD
  • You want to review what will be checked out before doing it
  • Working in a script that separates download and checkout steps
Use dvc pull when:
  • You want data immediately in workspace
  • Doing regular development work
  • Simplicity is more important than control
  • You’re syncing after git pull

Performance tips

Maximize parallelism - Use more jobs for faster downloads:
dvc fetch --jobs 32
Fetch selectively - Use filters to avoid downloading unnecessary data:
dvc fetch --type metrics --max-size 10485760
Use shared cache - Configure a shared cache directory to avoid duplicate downloads across team members:
dvc cache dir /shared/cache
Fetch overnight - For large datasets, fetch all branches overnight:
nohup dvc fetch --all-branches &

Error handling

Missing remote

ERROR: no remote provided and no default remote set
Solution: Configure a remote:
dvc remote add -d origin <remote-url>

Authentication errors

ERROR: failed to fetch data from the cloud
Solution: Set up credentials for your storage backend.

Disk space issues

ERROR: not enough disk space
Solution: Either:
  1. Free up space
  2. Use --max-size to limit what’s fetched
  3. Use --type to fetch only specific file types
  • dvc pull - Fetch and checkout in one command
  • dvc push - Upload data to remote storage
  • dvc checkout - Update workspace from cache
  • dvc status - Check sync status with remote
  • dvc cache - Manage local cache

Build docs developers (and LLMs) love