Overview
TheDataset class represents a dataset in SyftBox with both mock (public) and private data components. It provides methods for accessing data files, metadata, and README documentation.
Dataset Class
Properties
Unique identifier for the dataset (auto-generated)
Timestamp when the dataset was created (UTC)
Timestamp when the dataset was last updated (UTC)
Unique name of the dataset
Short description of the dataset
Tags for categorizing the dataset
Location identifier for remote datasets requiring manual syncing
URL to the mock (public) dataset location
URL to the private dataset location
URL to the README file if provided
URLs to uploaded mock data files (excludes metadata files)
Local paths to private data files (excludes metadata files)
Computed Properties
Email address of the dataset owner (extracted from mock_url)
Local filesystem path to the mock data directory
Local filesystem path to the private data directory
Local filesystem path to the README file
Absolute paths to all mock files (excludes dataset.yaml and README)
Absolute paths to all private files (excludes private_metadata.yaml)
Combined list of all mock and private file paths
Private dataset configuration (cached property)
Path to the private metadata YAML file
Methods
get_readme
Retrieve the content of the README file.The README content as a string, or None if no README exists
describe
Display a formatted description of the dataset (Jupyter/IPython only).- Dataset name and metadata
- Creation date
- Summary and tags
- Paths to mock and private data
- Link to README if available
save
Save dataset metadata to a YAML file.Path where the YAML file should be saved. Must have .yaml extension.
ValueError: If filepath doesn’t have .yaml extension
load
Load a dataset from a YAML metadata file.Path to the dataset YAML metadata file
SyftBox configuration for resolving paths
Loaded Dataset instance
FileNotFoundError: If metadata file doesn’t exist
Usage Examples
Accessing Dataset Files
Reading Dataset Content
Dataset Metadata
Jupyter/IPython Display
PrivateDatasetConfig Class
Stores private dataset metadata outside the sync folder.Properties
Dataset unique identifier (matches parent dataset)
Path to the private data directory
Methods
save
Save private configuration to a YAML file.load
Load private configuration from a YAML file.SyftBoxURL Type
Custom URL type for addressing resources in SyftBox.Format
Supports two URL formats:-
Email-based (for public data):
-
Simple path (for private/local data):
Properties
The protocol (always “syft://”)
The host/email component
The path component
Query parameters as a dictionary
Methods
to_local_path
Convert the URL to a local filesystem path.Base SyftBox directory
Resolved local filesystem path
from_path
Create a SyftBoxURL from a local filesystem path.Local filesystem path
Base SyftBox directory
SyftBoxURL instance
is_valid
Validate a URL string.URL string to validate
True if valid, False otherwise
PathLike Type
Type alias for path-like objects.- String paths:
"/path/to/file" pathlib.Pathobjects- Any object implementing
os.PathLike