Skip to main content
syft-dataset (note: package name is syft-dataset, not syft-datasets) provides configuration utilities for managing SyftBox datasets and folder structures.

Installation

pip install syft-dataset

When to Use

Use syft-dataset when you need to:
  • Configure SyftBox folder paths
  • Access private and public directories in your SyftBox
  • Manage dataset locations in your datasite
  • Build applications that need to know SyftBox folder structure

API Reference

Main Export

from syft_datasets import SyftBoxConfig

SyftBoxConfig

The SyftBoxConfig class provides a structured way to access SyftBox directories.
from syft_datasets import SyftBoxConfig
from pathlib import Path

# Create configuration
config = SyftBoxConfig(
    syftbox_folder=Path("~/SyftBox").expanduser(),
    email="[email protected]"
)

# Access directories
print(config.private_dir)  # ~/SyftBox/private
print(config.public_dir)   # ~/SyftBox/[email protected]/public

Configuration Properties

syftbox_folder

The root path to your SyftBox folder on the local filesystem.
config.syftbox_folder  # Path object

email

Your email address associated with the SyftBox.
config.email  # str

private_dir

Path to your private directory where sensitive data is stored.
config.private_dir  # Path: {syftbox_folder}/private

public_dir

Path to your public directory where shared data is stored.
config.public_dir  # Path: {syftbox_folder}/{email}/public

Basic Usage

Accessing SyftBox Directories

from syft_datasets import SyftBoxConfig
from pathlib import Path

# Initialize config
config = SyftBoxConfig(
    syftbox_folder=Path.home() / "SyftBox",
    email="[email protected]"
)

# Read from private directory
private_file = config.private_dir / "my_data.csv"
if private_file.exists():
    print(f"Found private data at {private_file}")

# Write to public directory
public_file = config.public_dir / "shared_results.json"
public_file.parent.mkdir(parents=True, exist_ok=True)
public_file.write_text('{"result": "success"}')

Using with Applications

from syft_datasets import SyftBoxConfig
import os

# Get SyftBox path from environment or use default
syftbox_path = Path(os.getenv("SYFTBOX_ROOT", "~/SyftBox")).expanduser()

config = SyftBoxConfig(
    syftbox_folder=syftbox_path,
    email="[email protected]"
)

# Use config throughout your application
def load_dataset(filename: str):
    dataset_path = config.private_dir / "datasets" / filename
    return dataset_path

def save_results(filename: str, data: str):
    output_path = config.public_dir / "results" / filename
    output_path.parent.mkdir(parents=True, exist_ok=True)
    output_path.write_text(data)

SyftBox Directory Structure

The package helps navigate the standard SyftBox structure:
SyftBox/
├── private/              # Private data (config.private_dir)
│   ├── datasets/
│   └── credentials/
└── [email protected]/     # User-specific folder
    └── public/           # Public shared data (config.public_dir)
        ├── results/
        └── shared/

Dependencies

  • pyyaml>=6.0.3 - YAML configuration parsing
  • syft-notebook-ui - UI utilities for notebooks
  • syft-perm - Permission management

Build docs developers (and LLMs) love