SyftDatasetManager

Overview

SyftDatasetManager is the primary interface for creating, retrieving, and managing datasets in SyftBox. It handles dataset storage, permissions, and synchronization across datasites.

Constructor

from syft_datasets import SyftDatasetManager

manager = SyftDatasetManager(
    syftbox_folder_path="/path/to/syftbox",
    email="[email protected]"
)

syftbox_folder_path

PathLike

required

Path to the SyftBox folder on the local filesystem

str

required

Email address associated with the datasite

Class Methods

from_config

Create a SyftDatasetManager from an existing SyftBoxConfig.

manager = SyftDatasetManager.from_config(config)

config

SyftBoxConfig

required

SyftBox configuration object

return

SyftDatasetManager

Configured dataset manager instance

Methods

create

Create a new dataset with mock and private data.

dataset = manager.create(
    name="my_dataset",
    mock_path="./data/mock",
    private_path="./data/private",
    summary="Sample dataset for analysis",
    readme_path="./README.md",
    tags=["healthcare", "research"],
    users=["[email protected]"]
)

name

str

required

Unique identifier for the dataset. Only alphanumeric characters, underscores, and hyphens are allowed.

mock_path

PathLike

required

Path to the mock data (file or directory) that will be shared publicly

private_path

PathLike

required

Path to the private data (file or directory) that remains local

summary

str | None

Short summary describing the dataset

readme_path

Path | None

Path to a markdown README file to include in the dataset

location

str | None

Location identifier for datasets hosted on remote locations requiring manual syncing (e.g., ‘high-side-1234’)

get

Retrieve a dataset by name.

dataset = manager.get("my_dataset")

# Get dataset from another datasite
dataset = manager.get("my_dataset", datasite="[email protected]")

name

str

required

Name of the dataset to retrieve

datasite

str | None

Email of the datasite owner. Defaults to the current user’s email

return

Dataset

The requested Dataset object

Raises:

FileNotFoundError: If dataset doesn’t exist

get_all

Retrieve all accessible datasets with optional filtering and pagination.

# Get all datasets
all_datasets = manager.get_all()

# Get datasets from specific datasite
datasets = manager.get_all(datasite="[email protected]")

# Get datasets with pagination and sorting
datasets = manager.get_all(
    limit=10,
    offset=0,
    order_by="created_at",
    sort_order="desc"
)

datasite

str | None

Filter datasets by datasite owner email

limit

int | None

Maximum number of datasets to return

offset

int | None

Number of datasets to skip (for pagination)

order_by

str | None

Field name to sort by (e.g., “created_at”, “name”)

sort_order

Literal['asc', 'desc']

default:"asc"

Sort order: ascending or descending

return

list[Dataset]

List of Dataset objects (as a TableList for nice display)

delete

Delete a dataset from the datasite.

# Delete with confirmation prompt
manager.delete("my_dataset")

# Delete without confirmation
manager.delete("my_dataset", require_confirmation=False)

name

str

required

Name of the dataset to delete

datasite

str | None

Email of the datasite owner. Defaults to current user. Must be your own datasite.

require_confirmation

bool

default:"true"

Whether to prompt for confirmation before deleting

Raises:

ValueError: If attempting to delete another user’s dataset
FileNotFoundError: If dataset doesn’t exist

Deleting a dataset removes both mock and private metadata directories. Private data files are only deleted if they’re managed by SyftBox.

share_dataset

Share an existing dataset with users.

# Share with specific users
manager.share_dataset("my_dataset", users=["[email protected]", "[email protected]"])

# Share with everyone
manager.share_dataset("my_dataset", users="any")

name

str

required

Name of the dataset to share

users

list[str] | str

required

List of email addresses or “any” to share with all users

Raises:

ValueError: If dataset doesn’t exist

Special Methods

Indexing

Access datasets by name or index.

# Access by name
dataset = manager["my_dataset"]

# Access by index
first_dataset = manager[0]

Iteration

Iterate over all datasets.

for dataset in manager:
    print(dataset.name)

Length

Get the total number of datasets.

total = len(manager)

Properties

syftbox_config

SyftBoxConfig

The SyftBox configuration used by this manager

Usage Example

from syft_datasets import SyftDatasetManager

# Initialize manager
manager = SyftDatasetManager(
    syftbox_folder_path="~/SyftBox",
    email="[email protected]"
)

# Create a new dataset
dataset = manager.create(
    name="patient_records",
    mock_path="./synthetic_data",
    private_path="./real_data",
    summary="Synthetic patient records for model training",
    tags=["healthcare", "synthetic"],
    users=["[email protected]"]
)

print(f"Created dataset: {dataset.name}")
print(f"Mock data location: {dataset.mock_dir}")

# List all datasets
all_datasets = manager.get_all()
print(f"Total datasets: {len(all_datasets)}")

# Retrieve a specific dataset
my_dataset = manager.get("patient_records")
print(f"Dataset summary: {my_dataset.summary}")

Constants

FOLDER_NAME

str

Default folder name for storing datasets

METADATA_FILENAME

str

Filename for dataset metadata

Constant for sharing datasets with all users

Client

Jobs

Datasets

Permissions

Background Services

Overview

Constructor

Class Methods

from_config

Methods

create

get

get_all

delete

share_dataset

Special Methods

Indexing

Iteration

Length

Properties

Usage Example

Constants

Build docs developers (and LLMs) love

Client

Jobs

Datasets

Permissions

Background Services

​Overview

​Constructor

​Class Methods

​from_config

​Methods

​create

​get

​get_all

​delete

​share_dataset

​Special Methods

​Indexing

​Iteration

​Length

​Properties

​Usage Example

​Constants

Build docs developers (and LLMs) love

Overview

Constructor

Class Methods

from_config

Methods

create

get

get_all

delete

share_dataset

Special Methods

Indexing

Iteration

Length

Properties

Usage Example

Constants