Skip to main content

Overview

Simulations allow you to test your federated learning workflow locally before deploying to real datasites. Syft-Flwr simulates a multi-party FL environment using mock datasets and temporary client instances.

Prerequisites

  • A bootstrapped Syft-Flwr project (see Bootstrapping Projects)
  • Mock datasets prepared for testing
  • Python 3.9 or higher

Quick Start

Using the CLI

syft_flwr run /path/to/project \
  --mock-dataset-paths /data/client1,/data/client2

Interactive Mode

If you don’t provide dataset paths, the CLI will prompt you:
syft_flwr run ./my-fl-project
# Enter comma-separated paths to mock datasets: /data/hospital1,/data/hospital2

Using Python API

from pathlib import Path
from syft_flwr.run_simulation import run

project_dir = Path("./my-fl-project")
mock_datasets = [
    "/data/hospital1",
    "/data/hospital2"
]

success = run(project_dir, mock_datasets)
if success:
    print("Simulation completed successfully!")

How Simulations Work

1. Mock RDS Client Setup

Simulations create temporary RDS (Remote Data Store) clients for each participant:
# run_simulation.py:41-65
def _setup_mock_rds_clients(
    project_dir: Path, aggregator: str, datasites: list[str]
) -> tuple[Path, list[RDSClient], RDSClient]:
    """Setup mock RDS clients for the given project directory"""
    simulated_syftbox_network_dir = Path(tempfile.gettempdir(), project_dir.name)
    
    # Create aggregator client
    ds_syftbox_client = create_temp_client(
        email=aggregator, workspace_dir=simulated_syftbox_network_dir
    )
    ds_rds_client = init_session(
        host=aggregator, email=aggregator, syftbox_client=ds_syftbox_client
    )
    
    # Create data owner clients
    do_rds_clients = []
    for datasite in datasites:
        do_syftbox_client = create_temp_client(
            email=datasite, workspace_dir=simulated_syftbox_network_dir
        )
        do_rds_client = init_session(
            host=datasite, email=datasite, syftbox_client=do_syftbox_client
        )
        do_rds_clients.append(do_rds_client)
    
    return simulated_syftbox_network_dir, do_rds_clients, ds_rds_client

2. Encryption Bootstrap

By default, simulations use end-to-end encryption:
# run_simulation.py:68-130
def _bootstrap_encryption_keys(
    do_clients: list[RDSClient], ds_client: RDSClient
) -> None:
    """Bootstrap the encryption keys for all clients if encryption is enabled."""
    encryption_enabled = (
        os.environ.get(SYFT_FLWR_ENCRYPTION_ENABLED, "true").lower() != "false"
    )
    
    if not encryption_enabled:
        logger.warning("⚠️ Encryption disabled - skipping key bootstrap")
        return
    
    logger.info("🔐 Bootstrapping encryption keys for all participants...")
    
    # Bootstrap server and clients
    # Verify DID documents are accessible
    # ...
To disable encryption for testing:
export SYFT_FLWR_ENCRYPTION_ENABLED=false
syft_flwr run ./my-fl-project -m /data/client1,/data/client2

3. Concurrent Execution

Server and clients run concurrently using asyncio:
# run_simulation.py:169-231
async def _run_simulated_flwr_project(
    project_dir: Path,
    do_clients: list[RDSClient],
    ds_client: RDSClient,
    mock_dataset_paths: list[Union[str, Path]],
) -> bool:
    """Run all clients and server concurrently"""
    log_dir = project_dir / "simulation_logs"
    log_dir.mkdir(parents=True, exist_ok=True)
    
    main_py_path = project_dir / "main.py"
    
    # Start server
    ds_task = asyncio.create_task(
        _run_main_py(
            main_py_path,
            ds_client._syftbox_client.config_path,
            ds_client.email,
            log_dir,
        )
    )
    
    # Start clients
    client_tasks = []
    for client, mock_dataset_path in zip(do_clients, mock_dataset_paths):
        client_tasks.append(
            asyncio.create_task(
                _run_main_py(
                    main_py_path,
                    client._syftbox_client.config_path,
                    client.email,
                    log_dir,
                    mock_dataset_path,
                )
            )
        )
    
    # Wait for server to complete
    ds_return_code = await ds_task
    
    # Cancel client tasks when server completes
    for task in client_tasks:
        if not task.done():
            task.cancel()
    
    return ds_return_code == 0

Mock Dataset Configuration

Dataset Structure

Each mock dataset path should contain the data for one client:
/data/
├── hospital1/
│   ├── train.csv
│   └── test.csv
└── hospital2/
    ├── train.csv
    └── test.csv

Accessing Datasets in Client Code

Clients access their dataset via the DATA_DIR environment variable:
# client_app.py
import os
import pandas as pd
from pathlib import Path
from syft_flwr.utils import get_syftbox_dataset_path

def load_data():
    # Automatically uses DATA_DIR environment variable
    data_dir = get_syftbox_dataset_path()
    
    df_train = pd.read_csv(data_dir / "train.csv")
    df_test = pd.read_csv(data_dir / "test.csv")
    
    return pd.concat([df_train, df_test], ignore_index=True)

Dataset Path Validation

Simulation validates all dataset paths before execution:
# run_simulation.py:249-257
def _validate_mock_dataset_paths(mock_dataset_paths: list[str]) -> list[Path]:
    """Validate the mock dataset paths"""
    resolved_paths = []
    for path in mock_dataset_paths:
        path = Path(path).expanduser().resolve()
        if not path.exists():
            raise ValueError(f"Mock dataset path {path} does not exist")
        resolved_paths.append(path)
    return resolved_paths

Simulation Logs

Logs are saved to <project_dir>/simulation_logs/:
my-fl-project/
└── simulation_logs/
    ├── [email protected]  # Server logs
    ├── [email protected]             # Client 1 logs
    └── [email protected]             # Client 2 logs

Viewing Logs

# View server logs
cat my-fl-project/simulation_logs/[email protected]

# View all logs
tail -f my-fl-project/simulation_logs/*.log

Running in Different Environments

# Standard execution
syft_flwr run ./my-fl-project \
  --mock-dataset-paths /data/c1,/data/c2
Returns exit code 0 on success, 1 on failure.

Complete Example

Here’s a full simulation workflow:
1

Prepare Mock Data

mkdir -p /tmp/mock_data/{hospital1,hospital2}

# Copy or generate mock datasets
cp hospital1_train.csv /tmp/mock_data/hospital1/train.csv
cp hospital1_test.csv /tmp/mock_data/hospital1/test.csv
cp hospital2_train.csv /tmp/mock_data/hospital2/train.csv
cp hospital2_test.csv /tmp/mock_data/hospital2/test.csv
2

Run Simulation

syft_flwr run ./fed-analytics-diabetes \
  --mock-dataset-paths /tmp/mock_data/hospital1,/tmp/mock_data/hospital2
3

Check Results

# Check if simulation succeeded
echo $?  # Should be 0

# Review logs
ls -la ./fed-analytics-diabetes/simulation_logs/
cat ./fed-analytics-diabetes/simulation_logs/*.log

Advanced Configuration

Skipping Module Validation

Useful for parallel test execution:
export SYFT_FLWR_SKIP_MODULE_CHECK=true
syft_flwr run ./my-fl-project -m /data/c1,/data/c2

Custom Temporary Directory

Simulations use /tmp by default. The directory is cleaned up automatically:
# run_simulation.py:292-312
async def main():
    try:
        run_success = await _run_simulated_flwr_project(...)
        if run_success:
            logger.success("Simulation completed successfully ✅")
        else:
            logger.error("Simulation failed ❌")
    finally:
        # Clean up the RDS stack
        remove_rds_stack_dir(simulated_syftbox_network_dir)
        # Remove config files and private keys
        remove_rds_stack_dir(simulated_syftbox_network_dir.parent / ".syftbox")

Troubleshooting

”Project directory does not exist”

Ensure the project is bootstrapped:
ls my-fl-project/main.py  # Must exist
ls my-fl-project/pyproject.toml

“Mock dataset path does not exist”

Verify all dataset paths:
ls /data/hospital1/  # Should contain train.csv, test.csv
ls /data/hospital2/

Simulation Hangs

Check logs for errors:
tail -f ./my-fl-project/simulation_logs/*.log
Common issues:
  • Client code has infinite loop
  • Server waiting for more clients than provided
  • Dataset loading errors

”FileNotFoundError: Path .data/ does not exist”

Ensure DATA_DIR environment variable is set correctly by the simulation runner:
# run_simulation.py:133-146
env = os.environ.copy()
env["SYFTBOX_CLIENT_CONFIG_PATH"] = str(config_path)
env["DATA_DIR"] = str(dataset_path)  # Set by simulation

Next Steps

Multi-Client Setup

Deploy to real datasites

Offline Training

Asynchronous FL patterns

Build docs developers (and LLMs) love