dvc.api.read()

Description

Returns the complete contents of a file tracked by DVC or Git. This is a convenience function that reads the entire file at once without requiring a context manager. For Git repositories, HEAD is used unless a rev argument is supplied. The default remote is tried unless a remote argument is supplied.

For large files, consider using dvc.api.open() instead to stream data and avoid loading the entire file into memory.

Signature

dvc.api.read(
    path: str,
    repo: Optional[str] = None,
    rev: Optional[str] = None,
    remote: Optional[str] = None,
    mode: str = "r",
    encoding: Optional[str] = None,
    config: Optional[dict[str, Any]] = None,
    remote_config: Optional[dict[str, Any]] = None,
) -> Union[str, bytes]

Parameters

path

str

required

Location and filename of the target file, relative to the root of the repository.

path="data/train.csv"
path="configs/params.yaml"
path="models/weights.pkl"

repo

str

default:"None"

Location of the DVC or Git repository. Defaults to the current project (found by walking up from the current working directory).Can be:

A URL to a Git repository (HTTP or SSH)
A local file system path
None to use the current repository

# Remote repository
repo="https://github.com/iterative/example-get-started"

# SSH URL
repo="[email protected]:user/repo.git"

# Local path
repo="/home/user/my-dvc-project"

rev

str

default:"None"

Any Git revision such as a branch name, tag name, commit hash, or DVC experiment name.

Defaults to HEAD for Git repositories
For local repositories, uses the working directory if not specified
Ignored if repo is not a Git repository

rev="main"               # Branch
rev="v2.0.0"             # Tag
rev="a3f5c2d"            # Commit hash  
rev="exp-best-model"     # Experiment

remote

str

default:"None"

Name of the DVC remote to use for fetching data. Defaults to the repository’s default remote.For local projects, the cache is checked before the default remote.

remote="myremote"
remote="aws-s3-storage"

mode

str

default:"r"

Mode in which to open the file. Defaults to "r" (read text mode).Only reading modes are supported:

"r" - Read text mode (returns str)
"rb" - Read binary mode (returns bytes)

mode="r"   # For text files
mode="rb"  # For binary files

encoding

str

default:"None"

Text encoding to use (e.g., "utf-8", "latin-1"). Only applicable in text mode (mode="r").Mirrors the encoding parameter in Python’s built-in open().

encoding="utf-8"
encoding="iso-8859-1"

config

dict

default:"None"

DVC config dictionary to pass to the repository.

config={"cache": {"dir": "/tmp/dvc-cache"}}

remote_config

dict

default:"None"

Remote configuration dictionary to pass to the repository.

remote_config={"url": "s3://my-bucket/dvc-storage"}

Returns

contents

Union[str, bytes]

The complete contents of the file:

Returns str when mode="r" (text mode)
Returns bytes when mode="rb" (binary mode)

Raises

FileMissingError

exception

Raised when the specified file does not exist in the repository.

OutputNotFoundError

exception

Raised when the file is not tracked by DVC.

ValueError

exception

Raised when a non-read mode is specified.

Examples

Basic Text File Reading

import dvc.api

# Read a CSV file
data = dvc.api.read(
    'data/train.csv',
    repo='https://github.com/iterative/example-get-started'
)
print(data)

Read Configuration File

import dvc.api
import yaml

# Read YAML parameters
params_yaml = dvc.api.read('params.yaml')
params = yaml.safe_load(params_yaml)

print(f"Learning rate: {params['train']['lr']}")
print(f"Epochs: {params['train']['epochs']}")

Read JSON Metrics

import dvc.api
import json

# Read metrics from a specific branch
metrics_json = dvc.api.read(
    'metrics/accuracy.json',
    rev='experiment-branch'
)
metrics = json.loads(metrics_json)

print(f"Accuracy: {metrics['accuracy']}")
print(f"F1 Score: {metrics['f1_score']}")

Binary File Reading

import dvc.api
import pickle

# Read a pickled model (binary mode)
model_bytes = dvc.api.read(
    'models/classifier.pkl',
    mode='rb',
    rev='production'
)
model = pickle.loads(model_bytes)

predictions = model.predict(X_test)

Read from Specific Tag

import dvc.api

# Get data from a released version
data_v1 = dvc.api.read(
    'data/dataset.csv',
    repo='https://github.com/user/ml-project',
    rev='v1.0.0'
)

data_v2 = dvc.api.read(
    'data/dataset.csv',
    repo='https://github.com/user/ml-project',
    rev='v2.0.0'
)

print(f"V1 size: {len(data_v1)} bytes")
print(f"V2 size: {len(data_v2)} bytes")

Private Repository with SSH

import dvc.api

# Access private repository (requires SSH keys configured)
data = dvc.api.read(
    'sensitive/data.txt',
    repo='[email protected]:company/private-repo.git',
    rev='main'
)

Read with Custom Encoding

import dvc.api

# Read file with specific encoding
data = dvc.api.read(
    'data/international.txt',
    encoding='utf-16'
)
print(data)

Read NumPy Array

import dvc.api
import numpy as np
from io import BytesIO

# Read binary NumPy file
array_bytes = dvc.api.read(
    'data/features.npy',
    mode='rb'
)
array = np.load(BytesIO(array_bytes))

print(f"Shape: {array.shape}")
print(f"Dtype: {array.dtype}")

Read from Local Repository

import dvc.api

# Read from local project by path
data = dvc.api.read(
    'data/processed.csv',
    repo='/path/to/my/project'
)

Error Handling

import dvc.api
from dvc.exceptions import FileMissingError, OutputNotFoundError

try:
    data = dvc.api.read(
        'data/missing.csv',
        repo='https://github.com/user/repo'
    )
except OutputNotFoundError:
    print("File is not tracked by DVC")
except FileMissingError:
    print("File does not exist in the repository")
except Exception as e:
    print(f"Unexpected error: {e}")

Use Cases

Configuration Loading

Load parameters, configs, or metadata files for experiments.

Small Data Files

Read datasets that fit comfortably in memory.

Model Loading

Load serialized models for inference or evaluation.

Metrics Retrieval

Fetch experiment metrics for analysis and comparison.

Comparison with dvc.api.open()

read() is a convenience wrapper around open() that reads the entire file and returns its contents.

Feature	`dvc.api.read()`	`dvc.api.open()`
Usage	Simple function call	Context manager (`with` statement)
Returns	Complete file contents	File object for streaming
Memory	Loads entire file	Streams incrementally
Best for	Small files	Large files
Code	`data = dvc.api.read('file.csv')`	`with dvc.api.open('file.csv') as f: ...`

# Using read() - Simpler for small files
data = dvc.api.read('small_config.json')
config = json.loads(data)

# Using open() - Better for large files
with dvc.api.open('large_dataset.csv') as f:
    for line in f:
        process(line)

Performance Considerations

read() loads the entire file into memory. For large files (>100MB), use dvc.api.open() to stream data instead.

Small Files (<10MB)
Medium Files (10-100MB)
Large Files (>100MB)

# Efficient for small files
data = dvc.api.read('config.json')

# Consider streaming if memory is limited
with dvc.api.open('medium_file.csv') as f:
    data = f.read()

# Always stream large files
with dvc.api.open('large_file.csv') as f:
    for chunk in f:
        process(chunk)

Best Practices

Use for small files only

read() is ideal for configuration files, parameters, and small datasets:

# ✅ Good - Small config file
params = yaml.safe_load(dvc.api.read('params.yaml'))

# ❌ Bad - Large dataset (use open() instead)
data = dvc.api.read('huge_dataset.csv')  # May cause memory issues

Choose correct mode

Use text mode for text files and binary mode for binary data:

# Text files
text = dvc.api.read('data.txt', mode='r')

# Binary files
data = dvc.api.read('model.pkl', mode='rb')

Parse returned data appropriately

Remember to parse the returned string/bytes:

# JSON
json_str = dvc.api.read('data.json')
data = json.loads(json_str)

# YAML
yaml_str = dvc.api.read('config.yaml')
config = yaml.safe_load(yaml_str)

# CSV (use open() for large CSVs)
csv_str = dvc.api.read('small.csv')
lines = csv_str.split('\n')

Handle exceptions properly

Always catch potential exceptions:

from dvc.exceptions import FileMissingError, OutputNotFoundError

try:
    data = dvc.api.read('data.csv')
except OutputNotFoundError:
    print("Not tracked by DVC")
except FileMissingError:
    print("File not found")

open()

Stream files with context manager

get_url()

Get remote storage URL

DVCFileSystem

Low-level file system access

Overview

Data Access

Metadata

SCM

Filesystem

dvc.api.read()

Description

Signature

Parameters

Returns

Raises

Examples

Basic Text File Reading

Read Configuration File

Read JSON Metrics

Binary File Reading

Read from Specific Tag

Private Repository with SSH

Read with Custom Encoding

Read NumPy Array

Read from Local Repository

Error Handling

Use Cases

Configuration Loading

Small Data Files

Model Loading

Metrics Retrieval

Comparison with dvc.api.open()

Performance Considerations

Best Practices

open()

get_url()

DVCFileSystem

Build docs developers (and LLMs) love

Overview

Data Access

Metadata

SCM

Filesystem

​Description

​Signature

​Parameters

​Returns

​Raises

​Examples

​Basic Text File Reading

​Read Configuration File

​Read JSON Metrics

​Binary File Reading

​Read from Specific Tag

​Private Repository with SSH

​Read with Custom Encoding

​Read NumPy Array

​Read from Local Repository

​Error Handling

​Use Cases

Configuration Loading

Small Data Files

Model Loading

Metrics Retrieval

​Comparison with dvc.api.open()

​Performance Considerations

​Best Practices

​Related Functions

open()

get_url()

DVCFileSystem

Build docs developers (and LLMs) love

Description

Signature

Parameters

Returns

Raises

Examples

Basic Text File Reading

Read Configuration File

Read JSON Metrics

Binary File Reading

Read from Specific Tag

Private Repository with SSH

Read with Custom Encoding

Read NumPy Array

Read from Local Repository

Error Handling

Use Cases

Comparison with dvc.api.open()

Performance Considerations

Best Practices

Related Functions