SyftBox Network Deployment

SyftBox network deployment enables production-grade federated learning across geographically distributed nodes. Each participant runs the SyftBox client on their own infrastructure, creating a true peer-to-peer privacy-preserving network.

Overview

SyftBox is a decentralized platform for privacy-preserving computation. It provides:

Decentralized Architecture: No central server or trusted third party
Data Sovereignty: Data owners maintain full control over their data
Consent-Based Computation: All jobs require explicit approval
Secure Communication: Encrypted data exchange between nodes
Production Ready: Designed for real-world federated learning deployments

Architecture

┌─────────────────────────────┐
│   Data Scientist Node       │
│   ┌─────────────────────┐   │
│   │  SyftBox Client     │   │
│   │  - FL Aggregator    │   │
│   │  - Job Submission   │   │
│   └─────────────────────┘   │
└──────────────┬──────────────┘
               │
               │ SyftBox P2P Network
               │ (Encrypted Communication)
               │
       ┌───────┴────────┐
       │                │
┌──────▼──────┐  ┌──────▼──────┐  ┌──────────────┐
│ DO1 Node    │  │ DO2 Node    │  │ DO3 Node     │
│ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐  │
│ │ SyftBox │ │  │ │ SyftBox │ │  │ │ SyftBox │  │
│ │ Client  │ │  │ │ Client  │ │  │ │ Client  │  │
│ ├─────────┤ │  │ ├─────────┤ │  │ ├─────────┤  │
│ │ Private │ │  │ │ Private │ │  │ │ Private │  │
│ │ Dataset │ │  │ │ Dataset │ │  │ │ Dataset │  │
│ └─────────┘ │  │ └─────────┘ │  │ └─────────┘  │
│ Hospital A  │  │ Hospital B  │  │ Hospital C   │
└─────────────┘  └─────────────┘  └──────────────┘

Setup

Prerequisites

Operating System: Linux, macOS, or Windows (WSL recommended)
Python: >= 3.12
Email: Valid email address for SyftBox account
Network: Stable internet connection
Storage: Sufficient disk space for datasets and models

Install SyftBox Client

Each participant installs the SyftBox client on their machine:

# Install using pip
pip install syftbox

# Or using uv
uv pip install syftbox

Initialize SyftBox

Run the client for the first time:

syftbox client

You’ll be prompted to:

Enter your email address
Verify your email (check inbox for verification link)
Choose a datasite directory (default: ~/.syftbox/)

The client will:

Create your local datasite
Generate cryptographic keys
Connect to the SyftBox network
Start syncing with peers

Directory Structure

After initialization, you’ll have:

~/.syftbox/
├── client_config.json      # Client configuration
├── datasites/
│   └── <your-email>/
│       ├── public/         # Publicly readable files
│       ├── private/        # Private datasets
│       ├── api_data/       # Shared with approved peers
│       └── sync/           # Sync state
├── logs/                   # Client logs
└── plugins/                # Installed plugins

Deployment Modes

Mode 1: Interactive Notebooks

Use Jupyter notebooks with SyftBox client running in the background.

Setup

Start SyftBox Client:

# Terminal 1: Run SyftBox client
syftbox client

Start Jupyter:

# Terminal 2: Start Jupyter
cd notebooks/fl-diabetes-prediction/distributed/
jupyter notebook

Follow Notebook Instructions:

Data Owners: Run do1.ipynb, do2.ipynb
Data Scientist: Run ds.ipynb

Data Owner Workflow

# In do1.ipynb or do2.ipynb
import syft_client as sc

# Connect to running SyftBox client
do_email = "[email protected]"  # Your SyftBox email
do_client = sc.login_do(email=do_email)

# Register dataset
do_client.create_dataset(
    name="diabetes-data",
    private_path="/path/to/private/data/",
    mock_path="/path/to/mock/data/",
    summary="Private diabetes dataset"
)

# Later: Check for incoming jobs
do_client.jobs

# Approve job
do_client.jobs[0].approve()

# Execute approved jobs
do_client.process_approved_jobs()

Data Scientist Workflow

# In ds.ipynb
import syft_client as sc
import syft_flwr

# Connect to SyftBox
ds_email = "[email protected]"
ds_client = sc.login_ds(email=ds_email)

# Add data owners as peers
ds_client.add_peer("[email protected]")
ds_client.add_peer("[email protected]")

# Explore datasets
do1_datasets = ds_client.datasets.get_all(datasite="[email protected]")

# Submit FL job
ds_client.submit_python_job(
    user="[email protected]",
    code_path="./fl_diabetes_prediction/",
    job_name="diabetes-fl-training"
)

# Run aggregation server
syft_flwr.run_aggregator(
    project_path="./fl_diabetes_prediction/",
    num_rounds=3
)

Mode 2: Automated Deployment

Run federated learning as a background service.

Setup

Install FL Project:

git clone https://github.com/OpenMined/syft-flwr.git
cd syft-flwr/notebooks/fl-diabetes-prediction/fl-diabetes-prediction/
uv sync

Configure SyftBox Integration:

Edit pyproject.toml:

[tool.syft_flwr]
datasites = [
    "[email protected]",
    "[email protected]",
    "[email protected]",
]
aggregator = "[email protected]"

Run on Each Node:

# Set environment variables
export SYFTBOX_EMAIL="<your-email>"
export SYFTBOX_FOLDER="~/.syftbox"

# Run main entry point
python main.py

The system automatically detects whether to run as client or server based on email configuration.

Mode 3: Docker Deployment

Deploy SyftBox and FL apps using Docker.

Build SyftBox Container

# Clone SyftBox repository
git clone https://github.com/OpenMined/syftbox.git
cd syftbox/docker/

# Build image
docker build -t syftbox-client .

# Run container
docker run -d \
  --name syftbox-do1 \
  -v /local/data:/data \
  -e SYFTBOX_EMAIL="[email protected]" \
  syftbox-client

Attach VSCode to Container

Install “Remote - Containers” extension in VSCode
Open Command Palette: Remote-Containers: Attach to Running Container
Select syftbox-do1 container
Open Jupyter notebooks inside container

Multi-Container Setup

Run 3 clients in separate containers (for testing):

# Data Owner 1
docker run -d --name syftbox-do1 \
  -e SYFTBOX_EMAIL="[email protected]" \
  syftbox-client

# Data Owner 2
docker run -d --name syftbox-do2 \
  -e SYFTBOX_EMAIL="[email protected]" \
  syftbox-client

# Data Scientist
docker run -d --name syftbox-ds \
  -e SYFTBOX_EMAIL="[email protected]" \
  syftbox-client

Production Best Practices

1. Data Governance

Data Owner Checklist:

Review all submitted job code before approval
Verify job submitter identity
Check privacy implications of requested computations
Ensure compliance with data protection regulations (GDPR, HIPAA)
Monitor job execution and resource usage
Audit job results before sharing

Code Review Example:

# Before approving, inspect the job code
job = do_client.jobs[0]
print(job.code_summary)  # High-level summary
print(job.code_path)     # Path to submitted code

# Review actual code files
import os
for root, dirs, files in os.walk(job.code_path):
    for file in files:
        if file.endswith('.py'):
            print(f"\n=== {file} ===")
            with open(os.path.join(root, file)) as f:
                print(f.read())

# Only approve if code is safe
if code_looks_safe:
    job.approve()
else:
    job.reject(reason="Suspicious data access patterns detected")

2. Security

Network Security:

# Run SyftBox behind firewall
# Only expose necessary ports
sudo ufw allow from <trusted-ip> to any port 8080

Data Encryption: SyftBox automatically encrypts:

Data in transit (TLS)
Peer-to-peer communication
Job submissions

For additional security:

# Encrypt datasets before registering
from syft_flwr.crypto import encrypt_dataset

encrypt_dataset(
    source="/path/to/data/",
    destination="/path/to/encrypted/",
    key=secret_key
)

do_client.create_dataset(
    name="encrypted-data",
    private_path="/path/to/encrypted/"
)

3. Monitoring

SyftBox Logs:

# View client logs
tail -f ~/.syftbox/logs/client.log

# Monitor network activity
grep "peer_sync" ~/.syftbox/logs/client.log

# Track job submissions
grep "job_submit" ~/.syftbox/logs/client.log

Custom Monitoring:

import syft_client as sc
from datetime import datetime

def monitor_jobs():
    client = sc.login_do(email="[email protected]")
    
    while True:
        jobs = client.jobs
        pending = [j for j in jobs if j.status == "pending"]
        
        if pending:
            print(f"[{datetime.now()}] {len(pending)} pending jobs")
            for job in pending:
                print(f"  - {job.name} from {job.submitter}")
        
        time.sleep(60)  # Check every minute

monitor_jobs()

4. Fault Tolerance

Handle Client Failures:

# In pyproject.toml
[tool.flwr.app.config]
min-available-clients = 2     # Can start with 2 out of 3 clients
min-fit-clients = 2            # Need 2 clients for training
fraction-fit = 0.66            # Sample 66% of available clients

Automatic Reconnection:

# Use systemd to restart SyftBox on failure (Linux)
sudo nano /etc/systemd/system/syftbox.service

[Unit]
Description=SyftBox Client
After=network.target

[Service]
Type=simple
User=<your-user>
ExecStart=/usr/local/bin/syftbox client
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

sudo systemctl enable syftbox
sudo systemctl start syftbox

5. Resource Management

Limit Resource Usage:

# Limit CPU/GPU usage per job
import os
os.environ["OMP_NUM_THREADS"] = "4"  # Limit CPU threads
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # Use only GPU 0

Job Scheduling:

# Process jobs during off-peak hours
import schedule
import time

def process_jobs():
    do_client.process_approved_jobs()

# Run jobs at 2 AM daily
schedule.every().day.at("02:00").do(process_jobs)

while True:
    schedule.run_pending()
    time.sleep(3600)

Example: Multi-Hospital Deployment

Scenario

3 hospitals want to collaboratively train a diabetes prediction model:

Hospital A: 500 patient records
Hospital B: 300 patient records
Hospital C: 400 patient records
Research Institute: Coordinates the study

Deployment

Hospital A (Data Owner):

# Install SyftBox
pip install syftbox

# Start client
syftbox client
# Email: [email protected]

# Register dataset
python register_dataset.py \
  --name diabetes-data \
  --private-path /secure/storage/diabetes/ \
  --summary "Hospital A diabetes records (n=500)"

Hospitals B & C: Repeat the same process with their data. Research Institute (Data Scientist):

# Install SyftBox
pip install syftbox

# Start client
syftbox client
# Email: [email protected]

# Run federated learning
python run_federated_study.py \
  --participants [email protected] [email protected] [email protected] \
  --rounds 5 \
  --model diabetes-prediction

Results

Privacy: No hospital shares patient records
Compliance: Meets HIPAA requirements
Performance: Model trained on 1,200 total records
Governance: Each hospital approved all computation

Troubleshooting

Client Won’t Connect

# Check network connectivity
ping syftbox.net

# Verify email
syftbox verify-email

# Restart client
syftbox client --reset

Peers Not Syncing

# Check peer status
syftbox peers list

# Manually sync
syftbox sync --force

Job Stuck in Pending

# Check job status
job = do_client.jobs[0]
print(job.status)
print(job.error_message)  # If any

# Re-submit if needed
job.resubmit()

Example Projects

Deployment Options

​Overview

​Architecture

​Setup

​Prerequisites

​Install SyftBox Client

​Initialize SyftBox

​Directory Structure

​Deployment Modes

​Mode 1: Interactive Notebooks

​Setup

​Data Owner Workflow

​Data Scientist Workflow

​Mode 2: Automated Deployment

​Setup

​Mode 3: Docker Deployment

​Build SyftBox Container

​Attach VSCode to Container

​Multi-Container Setup

​Production Best Practices

​1. Data Governance

​2. Security

​3. Monitoring

​4. Fault Tolerance

​5. Resource Management

​Example: Multi-Hospital Deployment

​Scenario

​Deployment

​Results

​Troubleshooting

​Client Won’t Connect

​Peers Not Syncing

​Job Stuck in Pending

​Next Steps

Run Local Simulation First

Try Google Colab

API Reference

Join Community

​Resources

Build docs developers (and LLMs) love

Overview

Architecture

Setup

Prerequisites

Install SyftBox Client

Initialize SyftBox

Directory Structure

Deployment Modes

Mode 1: Interactive Notebooks

Setup

Data Owner Workflow

Data Scientist Workflow

Mode 2: Automated Deployment

Setup

Mode 3: Docker Deployment

Build SyftBox Container

Attach VSCode to Container

Multi-Container Setup

Production Best Practices

1. Data Governance

2. Security

3. Monitoring

4. Fault Tolerance

5. Resource Management

Example: Multi-Hospital Deployment

Scenario

Deployment

Results

Troubleshooting

Client Won’t Connect

Peers Not Syncing

Job Stuck in Pending

Next Steps

Resources