Skip to main content
This tutorial guides you through setting up a local SyftBox environment for running federated learning (FL) projects with low latency and full privacy features.

What is SyftBox?

SyftBox is a file-based communication layer that enables:
  • Offline-capable FL: Asynchronous model training across participants
  • Privacy-first: Data never leaves the owner’s machine
  • Zero servers: Communication happens through synced folders
  • End-to-end encryption: Secure message passing with RPC and crypto

Architecture

A local SyftBox network consists of:
~/SyftBox/                    # Your SyftBox directory
├── datasites/                 # Synced datasites from peers
│   ├── [email protected]/
│   ├── [email protected]/
│   └── [email protected]/
├── apps/                      # Installed SyftBox apps
├── apis/                      # FL message exchange happens here
└── sync/                      # File sync metadata
1
Install SyftBox
2
Install SyftBox using the official installer:
3
On macOS and Linux
4
curl -fsSL https://syftbox.net/install.sh | sh
5
On Windows
6
Download and run the installer from:
7
https://www.syftbox.net/
8
Verify Installation
9
syftbox --version
10
You should see output like:
11
SyftBox version 0.1.0
12
You can also install the SyftBox UI app from syftbox.net for a graphical interface.
13
Create Your Account
14
Run the SyftBox client for the first time:
15
syftbox client
16
You’ll be prompted to:
17
  • Enter your email address (this becomes your datasite ID)
  • Choose a SyftBox directory location (default: ~/SyftBox)
  • Configure sync settings
  • 18
    The client will:
    19
  • Create your local SyftBox directory
  • Generate encryption keys
  • Start the sync service
  • Connect to the SyftBox network
  • 20
    Your email serves as your unique identifier across the SyftBox network. Use a valid email that you control.
    21
    Explore the Directory Structure
    22
    After initialization, examine your SyftBox directory:
    23
    ls -la ~/SyftBox
    
    24
    You should see:
    25
    ~/SyftBox/
    ├── .syftbox/              # Configuration and keys
    │   ├── config.json        # Client configuration
    │   └── client_config.json # Connection settings
    ├── datasites/             # Peer datasites appear here
    ├── public/                # Your public folder (readable by all)
    ├── private/               # Your private data (only you can read)
    ├── api_data/              # API and app data
    └── sync/                  # Sync metadata
    
    26
    Configure as Data Owner
    27
    If you’re a data owner, set up your datasite:
    28
    import syft_rds as sy
    
    # Initialize as admin on your own datasite
    do_email = "[email protected]"  # Your email
    do_client = sy.init_session(host=do_email, email=do_email)
    
    # Verify admin access
    print(f"Admin access: {do_client.is_admin}")
    
    29
    Create a Dataset
    30
    Register your private dataset with SyftBox:
    31
    from pathlib import Path
    
    # Prepare your dataset
    DATASET_DIR = Path("./my-dataset").absolute()
    
    # Ensure it has the required structure:
    # my-dataset/
    # ├── private/
    # │   ├── train.csv
    # │   └── test.csv
    # └── mock/
    #     ├── train.csv
    #     └── test.csv
    
    # Create the dataset in SyftBox
    do_client.dataset.create(
        name="pima-indians-diabetes-database",
        asset_path=DATASET_DIR,
        description="Diabetes prediction dataset",
    )
    
    print("✅ Dataset created successfully")
    
    32
    Private data (private/) stays on your machine and is never shared. Only mock data (mock/) is accessible to data scientists for development.
    33
    Verify Dataset Registration
    34
    # List all datasets
    datasets = do_client.dataset.get_all()
    for ds in datasets:
        print(f"Dataset: {ds.name}")
        print(f"  Private path: {ds.get_private_path()}")
        print(f"  Mock path: {ds.get_mock_path()}")
    
    35
    Configure as Data Scientist
    36
    If you’re a data scientist, connect to data owners:
    37
    import syft_rds as sy
    
    # Initialize your session
    ds_email = "[email protected]"
    ds = sy.init_session(host=ds_email, email=ds_email)
    
    print(f"Data scientist logged in: {ds_email}")
    
    38
    Connect to Data Owners
    39
    # Connect to first data owner
    do1_email = "[email protected]"
    do1_client = sy.init_session(
        host=do1_email,
        email=ds_email,  # Login as yourself (guest)
        start_syft_event_server=False,
    )
    
    print(f"Connected to {do1_email}")
    print(f"Admin access: {do1_client.is_admin}")  # Should be False
    
    # Connect to second data owner
    do2_email = "[email protected]"
    do2_client = sy.init_session(
        host=do2_email,
        email=ds_email,
        start_syft_event_server=False,
    )
    
    print(f"Connected to {do2_email}")
    
    40
    Explore Available Datasets
    41
    # Get DO1's datasets
    do1_datasets = do1_client.dataset.get_all()
    for ds in do1_datasets:
        print(f"\nDataset: {ds.name}")
        print(f"  Description: {ds.description}")
        
        # You can access mock data
        mock_path = ds.get_mock_path()
        print(f"  Mock data: {mock_path}")
        
        # But NOT private data (will raise error)
        try:
            private_path = ds.get_private_path()
        except Exception as e:
            print(f"  Private data: ❌ Access denied")
    
    42
    As a guest user, you can only access mock data. Private data requires admin privileges.
    43
    Set Up Multi-Machine Network
    44
    For a realistic FL setup across multiple machines:
    45
    Machine 1 (Data Owner 1)
    46
    # On first machine
    syftbox client
    # Enter: [email protected]
    
    47
    Machine 2 (Data Owner 2)
    48
    # On second machine  
    syftbox client
    # Enter: [email protected]
    
    49
    Machine 3 (Data Scientist)
    50
    # On third machine
    syftbox client
    # Enter: [email protected]
    
    51
    All machines will automatically sync through the SyftBox network.
    52
    Configure Environment Variables
    53
    Set up environment variables for FL execution:
    54
    import os
    from pathlib import Path
    
    # Point to your SyftBox config
    ds = sy.init_session(host="[email protected]", email="[email protected]")
    os.environ["SYFTBOX_CLIENT_CONFIG_PATH"] = str(ds.syftbox_client.config_path)
    
    # Configure logging
    os.environ["LOGURU_LEVEL"] = "DEBUG"
    
    # Set message timeout (in seconds)
    os.environ["SYFT_FLWR_MSG_TIMEOUT"] = "30"
    
    print("✅ Environment configured")
    
    55
    Test Local Simulation
    56
    Before running distributed FL, test with local simulation:
    57
    import syft_flwr
    from pathlib import Path
    
    # Your FL project directory
    PROJECT_PATH = Path("./fl-diabetes-prediction")
    
    # Mock dataset paths from data owners
    mock_paths = [
        do1_client.dataset.get(name="pima-indians-diabetes-database").get_mock_path(),
        do2_client.dataset.get(name="pima-indians-diabetes-database").get_mock_path(),
    ]
    
    print(f"Mock paths: {mock_paths}")
    
    # Run simulation
    syft_flwr.run(PROJECT_PATH, mock_paths)
    
    58
    This runs a local simulation with:
    59
  • 2 client threads (simulating data owners)
  • 1 server thread (data scientist)
  • Communication via local files
  • 60
    Check logs in {PROJECT_PATH}/simulation_logs/:
    61
    simulation_logs/
    ├── client_0.log
    ├── client_1.log
    └── server.log
    
    62
    Enable End-to-End Encryption
    63
    By default, SyftBox uses end-to-end encryption for FL messages:
    64
    import os
    
    # Encryption is enabled by default
    # To explicitly control:
    os.environ["SYFT_FLWR_ENCRYPTION_ENABLED"] = "true"  # Enable
    # os.environ["SYFT_FLWR_ENCRYPTION_ENABLED"] = "false"  # Disable (dev only)
    
    65
    Encryption bootstrap happens automatically:
    66
    from syft_crypto.x3dh_bootstrap import ensure_bootstrap
    
    # This runs automatically in syft_flwr
    client = ensure_bootstrap(syftbox_client)
    print("🔐 End-to-end encryption enabled")
    
    67
    Only disable encryption for local development. Production FL should always use encryption.
    68
    Run Distributed FL
    69
    Now run FL across your distributed network:
    70
    1. Bootstrap the Project
    71
    import syft_flwr
    
    PROJECT_PATH = Path("./fl-diabetes-prediction")
    do_emails = ["[email protected]", "[email protected]"]
    ds_email = "[email protected]"
    
    syft_flwr.bootstrap(
        PROJECT_PATH,
        aggregator=ds_email,
        datasites=do_emails,
        transport="syftbox",  # Use local SyftBox
    )
    
    72
    2. Submit Jobs to Data Owners
    73
    # Clean before submitting
    !rm -rf {PROJECT_PATH / "fl_diabetes_prediction" / "__pycache__"}
    !rm -rf {PROJECT_PATH / "simulation_logs"}
    
    # Submit to DO1
    do1_client.job.submit(
        name="fl-diabetes-prediction",
        user_code_path=PROJECT_PATH,
        dataset_name="pima-indians-diabetes-database",
        entrypoint="main.py",
    )
    
    # Submit to DO2
    do2_client.job.submit(
        name="fl-diabetes-prediction",
        user_code_path=PROJECT_PATH,
        dataset_name="pima-indians-diabetes-database",
        entrypoint="main.py",
    )
    
    print("✅ Jobs submitted, waiting for approval...")
    
    74
    3. Data Owners Approve Jobs
    75
    On each data owner machine:
    76
    # Data owner reviews and approves
    jobs = do1_client.job.get_all()
    pending_job = jobs[0]
    
    print(f"Job from: {pending_job.created_by}")
    print(f"Code path: {pending_job.user_code_path}")
    
    # Approve the job
    do1_client.job.approve(pending_job)
    print("✅ Job approved")
    
    77
    4. Start the FL Server
    78
    # Data scientist submits server job to themselves
    server_job = ds.job.submit(
        name="fl-diabetes-prediction-server",
        user_code_path=PROJECT_PATH,
        entrypoint="main.py",
    )
    
    # Auto-approve own job
    ds.job.approve(server_job)
    
    # Run the server (blocking)
    ds.run_private(server_job, blocking=True)
    
    79
    5. Monitor Results
    80
    # View logs
    ds.job.show_logs(server_job)
    
    # Check output directory
    output_dir = ds.job.get_output_dir(server_job)
    print(f"Results saved to: {output_dir}")
    
    # List model checkpoints
    weights_dir = output_dir / "weights"
    for weight_file in weights_dir.glob("*.safetensors"):
        print(f"  - {weight_file.name}")
    
    81
    Monitor SyftBox Sync
    82
    Check that files are syncing properly:
    83
    # View sync status
    syftbox status
    
    # Check recent sync activity
    tail -f ~/SyftBox/.syftbox/logs/sync.log
    
    84
    Clean Up
    85
    After completing your FL experiment:
    86
    # Remove temporary files
    !rm -rf {PROJECT_PATH / "main.py"}
    !rm -rf {PROJECT_PATH / "**/__pycache__"}
    !rm -rf {PROJECT_PATH / "simulation_logs"}
    
    print("✅ Cleanup complete")
    

    SyftBox vs Google Drive Transport

    FeatureSyftBox (Local)Google Drive (P2P)
    LatencyLow (less than 1s)High (30-60s)
    Encryption✅ End-to-end❌ No encryption
    SetupInstall clientJust browser
    Offline✅ Full supportLimited
    NetworkingNone requiredNone required
    Best forProduction FLQuick experiments

    Folder Permissions

    SyftBox enforces strict folder permissions:
    private/          # Only you (admin) can read/write
    public/           # Everyone can read, only you can write  
    api_data/         # Apps can read/write with permissions
    datasites/{peer}/ # Read-only view of peer's public data
    

    Advanced Configuration

    Customize your SyftBox setup:

    Change Sync Directory

    syftbox client --path /custom/path/to/SyftBox
    

    Adjust Sync Frequency

    Edit ~/SyftBox/.syftbox/config.json:
    {
      "sync_interval_seconds": 5,
      "message_timeout_seconds": 30
    }
    

    Use Custom Network

    For private networks:
    syftbox client --network-url https://your-syftbox-server.com
    

    Troubleshooting

    Check that port 8080 isn’t in use:
    lsof -i :8080
    
    Kill any conflicting process or configure a different port.
    Verify:
    • SyftBox client is running (syftbox status)
    • You’re connected to the network
    • Check sync logs: tail -f ~/SyftBox/.syftbox/logs/sync.log
    Ensure:
    • You’re using the correct client (admin for private, guest for mock)
    • Dataset is properly registered
    • Paths exist and have correct permissions
    • Increase timeout: os.environ["SYFT_FLWR_MSG_TIMEOUT"] = "60"
    • Check app_name matches in all participants
    • Verify encryption keys are bootstrapped

    What’s Next?

    Build docs developers (and LLMs) love