Synopsis
Description
Thedvc pull command downloads DVC-tracked files from remote storage to your local cache and checks them out to your workspace. It’s a combination of dvc fetch and dvc checkout.
This is analogous to git pull but for your data files. Use dvc pull to:
- Get data after cloning a repository
- Sync data after pulling Git changes
- Download data for a specific branch or experiment
- Restore missing or deleted data files
- Downloads missing files from remote storage to local cache
- Creates links (or copies) from cache to workspace
- Updates your workspace to match the
.dvcfile specifications
You must configure a remote storage location before using
dvc pull. Use dvc remote add to set up a remote, or check .dvc/config if your team has already configured one.Options
Limit command scope to specific tracked files/directories,
.dvc files, or stage names. If not specified, pulls all tracked data.Remote storage to pull from. If not specified, uses the default remote configured in
.dvc/config.Number of jobs to run simultaneously. Higher values increase parallelism but use more resources.
Fetch cache for all Git branches.
Fetch cache for all Git tags.
Fetch cache for all Git commits.
Do not prompt when removing working directory files. Forces overwrite of modified files.
Fetch cache for all dependencies of the specified target.
Pull cache for subdirectories of the specified directory.
Fetch run history for all stages.
Ignore errors if some of the files or directories are missing from remote.
Examples
Basic pull
Pull all tracked data from the default remote:Initial setup after cloning
Common workflow after cloning a repository:Pull after Git changes
Sync data after pulling Git changes:Pull specific files
Pull only specific targets:Pull from specific remote
Pull from a named remote:Pull with higher parallelism
Speed up pull with more concurrent jobs:Force pull
Overwrite local changes:Pull with dependencies
Pull a pipeline stage and all its dependencies:Pull all branches
Fetch data for all branches (useful for caching):Example workflows
Workflow 1: New team member
Workflow 2: Switch branches
Workflow 3: Sync with team changes
Workflow 4: CI/CD pipeline
Workflow 5: Selective data loading
Setting up remotes
Before usingdvc pull, ensure a remote is configured:
Understanding pull output
File status indicators:| Symbol | Meaning |
|---|---|
A | Added (new file created) |
M | Modified (file was updated) |
D | Deleted (file was removed) |
Error handling
No remote configured
Authentication errors
Missing files in remote
- Ask teammate to push:
dvc push - Use
--allow-missingto skip missing files:
Network interruption
If pull is interrupted, simply run it again:Difference between pull, fetch, and checkout
| Command | Downloads from remote | Updates workspace |
|---|---|---|
dvc pull | ✓ | ✓ |
dvc fetch | ✓ | ✗ |
dvc checkout | ✗ | ✓ |
Use
dvc pull for most cases - it does both fetch and checkout.Use dvc fetch when you want to pre-download data without changing workspace.Use dvc checkout when data is already in cache and you just need to update workspace.Performance tips
Best practices
- Always pull after git pull: Keep data in sync with code
- Pull before starting work: Ensure you have latest data
- Use specific targets in CI: Only pull data needed for tests
- Configure credentials securely: Use environment variables or IAM roles
Related commands
dvc push- Upload data to remote storagedvc fetch- Download to cache onlydvc checkout- Update workspace from cachedvc status- Check sync status with remote