syft-flwr run

Synopsis

syft-flwr run PROJECT_DIR [OPTIONS]

Description

The run command executes a bootstrapped syft-flwr project in simulation mode using mock datasets. This allows you to:

Test your federated learning workflow locally
Validate model training with sample data
Debug issues before deploying to distributed datasites
Simulate multiple clients with different datasets

The simulation creates a temporary SyftBox network with all clients and the aggregator server, runs the federated learning workflow, and cleans up automatically when complete.

Arguments

PROJECT_DIR

path

required

Path to a bootstrapped syft-flwr project directory.The directory must contain:

pyproject.toml with syft-flwr configuration
main.py (created by syft-flwr bootstrap)

Both relative and absolute paths are supported. Paths with ~ are expanded.

Options

--mock-dataset-paths

string

Comma-separated list of paths to mock datasets for each client.Aliases: -mFormat: Comma-separated file system paths (e.g., ./data/client1,./data/client2)Requirements:

Number of paths must match the number of datasites configured in pyproject.toml
All paths must exist on the file system
Each path should contain the dataset for one client

If not provided via flag, you’ll be prompted interactively:

Enter comma-separated paths to mock datasets:

--help

flag

Display help information for the run command.Aliases: -h

Examples

Run with All Flags

syft-flwr run ./my-fl-project \
  --mock-dataset-paths ./data/hospital1,./data/hospital2,./data/hospital3

Output:

Running syft_flwr project at '/path/to/my-fl-project'
Mock dataset paths: ['./data/hospital1', './data/hospital2', './data/hospital3']
[Simulation logs...]
Simulation completed successfully ✅

Run with Interactive Prompts

syft-flwr run ./my-fl-project

Interactive session:

Enter comma-separated paths to mock datasets: ./data/client1,./data/client2
Running syft_flwr project at '/path/to/my-fl-project'
Mock dataset paths: ['./data/client1', './data/client2']
[Simulation logs...]
Simulation completed successfully ✅

Run with Short Flag

syft-flwr run ./diabetes-fl -m ~/datasets/do1,~/datasets/do2

Run with Absolute Paths

syft-flwr run ~/projects/fl/diabetes-prediction \
  --mock-dataset-paths /data/samples/client1,/data/samples/client2

Run with Current Directory

cd my-fl-project
syft-flwr run . -m ../test-data/client1,../test-data/client2

How It Works

Simulation Process

Validation: Validates project structure and dataset paths
Setup: Creates temporary SyftBox network directory
Client Creation: Initializes mock RDS clients for each datasite
Encryption Bootstrap: Sets up E2E encryption keys (if enabled)
Parallel Execution: Runs server and all clients concurrently
Logging: Captures output from each participant
Cleanup: Removes temporary directories and configuration

Temporary Environment

During simulation, the following temporary directories are created:

/tmp/
  ├── my-fl-project/           # Simulated SyftBox network
  │   ├── [email protected]/      # Aggregator workspace
  │   ├── [email protected]/     # Client 1 workspace  
  │   └── [email protected]/     # Client 2 workspace
  └── .syftbox/                # Encryption keys and config
      ├── [email protected]/
      ├── [email protected]/
      └── [email protected]/

All temporary files are automatically deleted when simulation completes.

Simulation Logs

Logs are written to PROJECT_DIR/simulation_logs/:

my-fl-project/
  └── simulation_logs/
      ├── [email protected]      # Aggregator logs
      ├── [email protected]     # Client 1 logs
      └── [email protected]     # Client 2 logs

These logs persist after simulation for debugging.

Environment Variables

The run command sets environment variables for each participant:

SYFTBOX_CLIENT_CONFIG_PATH

string

Path to the participant’s SyftBox client configuration file.Set automatically for each client and server process.

DATA_DIR

string

Path to the mock dataset for each client.Set from the corresponding --mock-dataset-paths value.

SYFT_FLWR_ENCRYPTION_ENABLED

boolean

default:"true"

Enable or disable E2E encryption during simulation.Set to false to disable encryption:

SYFT_FLWR_ENCRYPTION_ENABLED=false syft-flwr run ./my-project -m ./data1,./data2

SYFT_FLWR_SKIP_MODULE_CHECK

boolean

default:"false"

Skip module validation during project loading.Useful for testing. Not recommended for production use.

Validation

The command validates:

Project directory exists and is a directory
pyproject.toml exists with valid syft-flwr configuration
main.py exists in the project directory
All mock dataset paths exist on the file system
Number of datasets matches number of configured datasites

Error Handling

Project Not Bootstrapped

$ syft-flwr run ./my-project
Error: main.py not found at ./my-project

Solution: Bootstrap the project first:

syft-flwr bootstrap ./my-project --aggregator [email protected] --datasites [email protected],[email protected]

Dataset Path Does Not Exist

$ syft-flwr run ./my-project -m ./nonexistent,./data
Error: Mock dataset path ./nonexistent does not exist

Solution: Ensure all dataset paths exist:

mkdir -p ./data/client1 ./data/client2
syft-flwr run ./my-project -m ./data/client1,./data/client2

Dataset Count Mismatch

$ syft-flwr run ./my-project -m ./data/client1
# When pyproject.toml has 2 datasites configured
Error: Expected 2 dataset paths, got 1

Solution: Provide one dataset path per datasite.

Simulation Failed

$ syft-flwr run ./my-project -m ./data1,./data2
Running syft_flwr project at './my-project'
Mock dataset paths: ['./data1', './data2']
[Simulation logs...]
Simulation failed ❌
Error: [error details]

Solution: Check simulation logs in ./my-project/simulation_logs/ for details.

Async Execution

In environments with an existing event loop (e.g., Jupyter notebooks), the run command returns an asyncio Task:

import asyncio
from syft_flwr.run_simulation import run

# Returns a Task in Jupyter
task = run("./my-project", ["./data1", "./data2"])

# Await the task
result = await task
print(f"Success: {result}")

In regular scripts, execution is synchronous and returns a boolean.

Encryption

By default, the simulation uses end-to-end encryption with:

Automatic key generation for all participants
DID (Decentralized Identifier) document creation
Encrypted message passing between clients and server

To disable encryption for debugging:

SYFT_FLWR_ENCRYPTION_ENABLED=false syft-flwr run ./my-project -m ./data1,./data2

You’ll see:

⚠️ Encryption disabled - skipping key bootstrap

Performance

Parallel Execution

All clients run in parallel using asyncio, with the aggregator server coordinating the workflow. The simulation completes when the server process finishes.

Resource Usage

Each client runs as a separate Python subprocess. For N clients:

Processes: N + 1 (clients + server)
Memory: Depends on model size and dataset
Disk: Temporary files for communication and logs

Exit Codes

0: Simulation succeeded
1: Simulation failed (validation error, runtime error, etc.)

Best Practices

Dataset Preparation

Organize mock datasets clearly:

my-fl-project/
  ├── mock-data/
  │   ├── client1/
  │   │   └── dataset.csv
  │   ├── client2/
  │   │   └── dataset.csv
  │   └── client3/
  │       └── dataset.csv
  ├── main.py
  └── pyproject.toml

Run from project root:

syft-flwr run . -m mock-data/client1,mock-data/client2,mock-data/client3

Testing Before Deployment

Always run simulations before deploying to distributed datasites:

Create representative mock datasets
Run simulation multiple times
Review logs for errors
Verify model training converges
Check aggregation results

Debugging

Enable detailed logging in your main.py:

import logging
logging.basicConfig(level=logging.DEBUG)

Logs will appear in simulation_logs/.

Core API

CLI Commands

Transport Layers

Orchestration

Synopsis

Description

Arguments

Options

Examples

Run with All Flags

Run with Interactive Prompts

Run with Short Flag

Run with Absolute Paths

Run with Current Directory

How It Works

Simulation Process

Temporary Environment

Simulation Logs

Environment Variables

Validation

Error Handling

Project Not Bootstrapped

Dataset Path Does Not Exist

Dataset Count Mismatch

Simulation Failed

Async Execution

Encryption

Performance

Parallel Execution

Resource Usage

Exit Codes

Best Practices

Dataset Preparation

Testing Before Deployment

Debugging

See Also

Build docs developers (and LLMs) love

Core API

CLI Commands

Transport Layers

Orchestration

​Synopsis

​Description

​Arguments

​Options

​Examples

​Run with All Flags

​Run with Interactive Prompts

​Run with Short Flag

​Run with Absolute Paths

​Run with Current Directory

​How It Works

​Simulation Process

​Temporary Environment

​Simulation Logs

​Environment Variables

​Validation

​Error Handling

​Project Not Bootstrapped

​Dataset Path Does Not Exist

​Dataset Count Mismatch

​Simulation Failed

​Async Execution

​Encryption

​Performance

​Parallel Execution

​Resource Usage

​Exit Codes

​Best Practices

​Dataset Preparation

​Testing Before Deployment

​Debugging

​See Also

Build docs developers (and LLMs) love

Synopsis

Description

Arguments

Options

Examples

Run with All Flags

Run with Interactive Prompts

Run with Short Flag

Run with Absolute Paths

Run with Current Directory

How It Works

Simulation Process

Temporary Environment

Simulation Logs

Environment Variables

Validation

Error Handling

Project Not Bootstrapped

Dataset Path Does Not Exist

Dataset Count Mismatch

Simulation Failed

Async Execution

Encryption

Performance

Parallel Execution

Resource Usage

Exit Codes

Best Practices

Dataset Preparation

Testing Before Deployment

Debugging

See Also