Skip to main content

Overview

Remote storage allows you to store your data, models, and pipeline outputs outside your Git repository. This enables team collaboration, backup, and access from different machines or environments.
DVC supports many storage types: Amazon S3, Google Cloud Storage, Azure Blob Storage, SSH, HTTP, and more.

Setting Up Remote Storage

1

Add a remote

Configure a remote storage location:
dvc remote add -d myremote s3://my-bucket/dvc-storage
The -d flag sets this as the default remote.
You can add multiple remotes and switch between them as needed.
2

Configure credentials

Set up authentication for your storage:
# Using AWS CLI profiles
dvc remote modify myremote profile myprofile

# Or set credentials directly (not recommended)
dvc remote modify myremote access_key_id YOUR_KEY
dvc remote modify myremote secret_access_key YOUR_SECRET
3

Commit configuration

Save remote configuration to Git:
git add .dvc/config
git commit -m "Configure DVC remote storage"
Never commit credentials to Git. Use environment variables or separate credential files.

Supported Storage Types

dvc remote add -d myremote s3://bucket/path
Configuration options:
dvc remote modify myremote region us-west-2
dvc remote modify myremote profile myprofile
dvc remote modify myremote endpoint_url https://s3.custom-endpoint.com

Pushing and Pulling Data

Push to Remote

Upload tracked data to remote storage:
dvc push

Pull from Remote

Download tracked data from remote storage:
dvc pull

Fetch (Download to Cache Only)

Download data to cache without checking out to workspace:
dvc fetch
Then checkout when needed:
dvc checkout
Use fetch + checkout when you want to download data but not immediately use it in your workspace.

Managing Remotes

List Remotes

dvc remote list
Example output:
myremote	s3://my-bucket/dvc-storage	(default)
backup		/mnt/backup/dvc-storage

Set Default Remote

dvc remote default myremote

Modify Remote Settings

# Modify URL
dvc remote modify myremote url s3://new-bucket/path

# Modify options
dvc remote modify myremote region us-east-1

# Unset option
dvc remote modify myremote --unset region

Remove Remote

dvc remote remove myremote

Rename Remote

dvc remote rename myremote production

Remote Configuration Levels

DVC supports three configuration levels:
dvc remote add myremote s3://bucket/path
Stored in .dvc/config (committed to Git, shared with team).
Store credentials in .dvc/config.local (local level) to avoid committing them to Git.

Advanced Remote Options

Parallel Jobs

Control how many files are transferred simultaneously:
dvc remote modify myremote jobs 16
Or per-command:
dvc push -j 16

Bandwidth Limit

# Limit to 10MB/s
dvc remote modify myremote bandwidth_limit 10485760

Connection Timeout

dvc remote modify myremote timeout 3600

SSL Verification

# Disable SSL verification (not recommended for production)
dvc remote modify myremote ssl_verify false

Custom Endpoint

# For S3-compatible storage
dvc remote modify myremote endpointurl https://minio.example.com

Server-Side Encryption

# For S3
dvc remote modify myremote sse AES256

# For S3 with KMS
dvc remote modify myremote sse aws:kms
dvc remote modify myremote sse_kms_key_id your-kms-key-id

Checking Storage Status

Compare local cache with remote:
dvc status --cloud
Example output:
Data and pipelines are up to date.

Remote storage status:
    new:        data/train.csv
    deleted:    models/old_model.pkl
Use dvc status -c to see what needs to be pushed or pulled.

Best Practices

Separate credentials

Store credentials in .dvc/config.local (not committed) or use environment variables

Use cloud IAM

Prefer IAM roles and instance profiles over access keys when possible

Enable versioning

Turn on bucket versioning in S3/GCS to protect against accidental deletions

Set lifecycle policies

Configure cloud storage lifecycle rules to archive or delete old data

Use multiple remotes

Configure backup remotes for disaster recovery

Optimize transfers

Adjust -j (jobs) based on network bandwidth and file count

Complete Examples

AWS S3 Setup

1

Create S3 bucket

aws s3 mb s3://my-dvc-storage
2

Configure DVC remote

dvc remote add -d storage s3://my-dvc-storage/project-data
dvc remote modify storage region us-west-2
3

Set credentials (local)

dvc remote modify --local storage profile myawsprofile
4

Push data

dvc push

Google Cloud Storage Setup

1

Create GCS bucket

gsutil mb gs://my-dvc-storage
2

Configure DVC remote

dvc remote add -d storage gs://my-dvc-storage/project-data
dvc remote modify storage projectname my-gcp-project
3

Set credentials (local)

dvc remote modify --local storage credentialpath ~/.config/gcloud/credentials.json
4

Push data

dvc push

Multi-Remote Setup

Configure primary and backup remotes:
# Primary remote (cloud)
dvc remote add -d primary s3://production-bucket/dvc
dvc remote modify primary region us-east-1

# Backup remote (local NAS)
dvc remote add backup /mnt/nas/dvc-backup

# Push to both
dvc push -r primary
dvc push -r backup

Troubleshooting

Check credentials configuration:
dvc remote list
dvc config -l
Verify cloud provider credentials:
# AWS
aws s3 ls s3://your-bucket/

# GCP
gsutil ls gs://your-bucket/
Increase parallel jobs:
dvc push -j 16
Or configure permanently:
dvc remote modify myremote jobs 16
Increase timeout:
dvc remote modify myremote timeout 7200
Check bucket permissions and IAM policies. Ensure your credentials have:
  • S3: s3:GetObject, s3:PutObject, s3:ListBucket
  • GCS: storage.objects.get, storage.objects.create, storage.buckets.list

Next Steps

Collaboration

Share data and pipelines with your team using remote storage

Remote Config

Explore advanced remote configuration options

Build docs developers (and LLMs) love