Skip to main content

Overview

The Dataset class represents a dataset in SyftBox with both mock (public) and private data components. It provides methods for accessing data files, metadata, and README documentation.

Dataset Class

Properties

uid
UUID
Unique identifier for the dataset (auto-generated)
created_at
datetime
Timestamp when the dataset was created (UTC)
updated_at
datetime
Timestamp when the dataset was last updated (UTC)
name
str
Unique name of the dataset
summary
str | None
Short description of the dataset
tags
list[str]
Tags for categorizing the dataset
location
str | None
Location identifier for remote datasets requiring manual syncing
mock_url
SyftBoxURL
URL to the mock (public) dataset location
private_url
SyftBoxURL
URL to the private dataset location
readme_url
SyftBoxURL | None
URL to the README file if provided
mock_files_urls
list[SyftBoxURL]
URLs to uploaded mock data files (excludes metadata files)
private_files_paths
list[Path]
Local paths to private data files (excludes metadata files)

Computed Properties

owner
str
Email address of the dataset owner (extracted from mock_url)
mock_dir
Path
Local filesystem path to the mock data directory
private_dir
Path
Local filesystem path to the private data directory
readme_path
Path | None
Local filesystem path to the README file
mock_files
list[Path]
Absolute paths to all mock files (excludes dataset.yaml and README)
private_files
list[Path]
Absolute paths to all private files (excludes private_metadata.yaml)
files
list[Path]
Combined list of all mock and private file paths
private_config
PrivateDatasetConfig
Private dataset configuration (cached property)
private_config_path
Path
Path to the private metadata YAML file

Methods

get_readme

Retrieve the content of the README file.
readme_content = dataset.get_readme()
if readme_content:
    print(readme_content)
return
str | None
The README content as a string, or None if no README exists

describe

Display a formatted description of the dataset (Jupyter/IPython only).
dataset.describe()
Shows an HTML-formatted view with:
  • Dataset name and metadata
  • Creation date
  • Summary and tags
  • Paths to mock and private data
  • Link to README if available

save

Save dataset metadata to a YAML file.
dataset.save(filepath="/path/to/dataset.yaml")
filepath
PathLike
required
Path where the YAML file should be saved. Must have .yaml extension.
Raises:
  • ValueError: If filepath doesn’t have .yaml extension

load

Load a dataset from a YAML metadata file.
dataset = Dataset.load(
    filepath="/path/to/dataset.yaml",
    syftbox_config=config
)
filepath
PathLike
required
Path to the dataset YAML metadata file
syftbox_config
SyftBoxConfig | None
SyftBox configuration for resolving paths
return
Dataset
Loaded Dataset instance
Raises:
  • FileNotFoundError: If metadata file doesn’t exist

Usage Examples

Accessing Dataset Files

# Get all mock (public) data files
for file_path in dataset.mock_files:
    print(f"Mock file: {file_path}")

# Get all private data files
for file_path in dataset.private_files:
    print(f"Private file: {file_path}")

# Get all files
for file_path in dataset.files:
    print(f"File: {file_path}")

Reading Dataset Content

# Read README
readme = dataset.get_readme()
if readme:
    print(readme)

# Access mock data directory
mock_dir = dataset.mock_dir
print(f"Mock data at: {mock_dir}")

# Access private data directory
private_dir = dataset.private_dir
print(f"Private data at: {private_dir}")

Dataset Metadata

print(f"Dataset: {dataset.name}")
print(f"Owner: {dataset.owner}")
print(f"Created: {dataset.created_at}")
print(f"Summary: {dataset.summary}")
print(f"Tags: {', '.join(dataset.tags)}")
print(f"UID: {dataset.uid}")

Jupyter/IPython Display

# In a Jupyter notebook
dataset.describe()  # Shows rich HTML display

# Or just display the dataset directly
dataset  # Triggers _repr_html_()

PrivateDatasetConfig Class

Stores private dataset metadata outside the sync folder.

Properties

uid
UUID
Dataset unique identifier (matches parent dataset)
data_dir
Path
Path to the private data directory

Methods

save

Save private configuration to a YAML file.
private_config.save(filepath="/path/to/private_metadata.yaml")

load

Load private configuration from a YAML file.
private_config = PrivateDatasetConfig.load(
    filepath="/path/to/private_metadata.yaml",
    syftbox_config=config
)

SyftBoxURL Type

Custom URL type for addressing resources in SyftBox.

Format

Supports two URL formats:
  1. Email-based (for public data):
    syft://[email protected]/path/to/resource
    
  2. Simple path (for private/local data):
    syft://private/path/to/resource
    

Properties

protocol
str
The protocol (always “syft://”)
host
str
The host/email component
path
str
The path component
query
dict[str, str]
Query parameters as a dictionary

Methods

to_local_path

Convert the URL to a local filesystem path.
url = SyftBoxURL("syft://[email protected]/public/data")
local_path = url.to_local_path(syftbox_folder="/path/to/syftbox")
# Returns: /path/to/syftbox/[email protected]/public/data
syftbox_folder
PathLike
required
Base SyftBox directory
return
Path
Resolved local filesystem path

from_path

Create a SyftBoxURL from a local filesystem path.
url = SyftBoxURL.from_path(
    path="/path/to/syftbox/[email protected]/public/data",
    syftbox_folder="/path/to/syftbox"
)
# Returns: syft://[email protected]/public/data
path
PathLike
required
Local filesystem path
syftbox_folder
PathLike
required
Base SyftBox directory
return
SyftBoxURL
SyftBoxURL instance

is_valid

Validate a URL string.
if SyftBoxURL.is_valid("syft://[email protected]/data"):
    url = SyftBoxURL("syft://[email protected]/data")
url
str
required
URL string to validate
return
bool
True if valid, False otherwise

PathLike Type

Type alias for path-like objects.
PathLike = Union[str, Path, os.PathLike]
Accepts:
  • String paths: "/path/to/file"
  • pathlib.Path objects
  • Any object implementing os.PathLike

Helper Function: to_path

Convert PathLike to a resolved Path object.
from syft_datasets.types import to_path

path = to_path("~/data/file.csv")
# Returns: Path with expanded home directory and resolved symlinks

Build docs developers (and LLMs) love