Diabetes Prediction

This example demonstrates a complete federated learning workflow for diabetes prediction. Multiple data owners collaboratively train a neural network without sharing their raw data, using the PIMA Indians Diabetes Database. FL Training Process

Overview

The diabetes prediction example trains a deep neural network to predict diabetes onset based on medical measurements. The data is distributed across multiple data owners (hospitals, clinics) who want to collaborate on model training while keeping their patient data private.

Key Features

Federated Learning: Decentralized training across multiple clients using Flower framework
Privacy-Preserving: Data remains with data owners; only model updates are shared
Imbalanced Data Handling: Uses SMOTE (Synthetic Minority Over-sampling Technique) for class balancing
Advanced Neural Architecture: Deep neural network with batch normalization and dropout
Multiple Deployment Modes: Local simulation, Google Colab, and SyftBox distributed deployment

Architecture

Model Structure

The neural network consists of:

Input Layer: 6 features (after preprocessing)
Hidden Layers:
- Layer 1: 32 units with BatchNorm, LeakyReLU, and Dropout (0.2)
- Layer 2: 24 units with BatchNorm, LeakyReLU, and Dropout (0.25)
- Layer 3: 16 units with BatchNorm and LeakyReLU
Output Layer: Single unit with Sigmoid activation (binary classification)

class Net(nn.Module):
    def __init__(self, input_dim=6):
        super(Net, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Linear(input_dim, 32),
            nn.BatchNorm1d(32),
            nn.LeakyReLU(0.1),
            nn.Dropout(0.2),
        )
        self.layer2 = nn.Sequential(
            nn.Linear(32, 24),
            nn.BatchNorm1d(24),
            nn.LeakyReLU(0.1),
            nn.Dropout(0.25),
        )
        self.layer3 = nn.Sequential(
            nn.Linear(24, 16),
            nn.BatchNorm1d(16),
            nn.LeakyReLU(0.1)
        )
        self.output_layer = nn.Sequential(
            nn.Linear(16, 1),
            nn.Sigmoid()
        )

Dataset

Source: PIMA Indians Diabetes Database Features:

Pregnancies
Glucose
Blood Pressure
BMI (Body Mass Index)
Diabetes Pedigree Function
Age

Preprocessing:

Removed SkinThickness and Insulin features
Imputed zero values with mean/median
Applied SMOTE for class balancing
Standardized features using StandardScaler

Partitioning: IID (Independent and Identically Distributed) partitioning across clients

Setup

Clone the Project

git clone https://github.com/OpenMined/syft-flwr.git _tmp \
    && mv _tmp/notebooks/fl-diabetes-prediction . \
    && rm -rf _tmp && cd fl-diabetes-prediction

Install Dependencies

Assuming you have Python and uv installed:

uv sync

This installs all required dependencies:

flwr-datasets>=0.5.0 - Federated dataset utilities
torch>=2.8.0 - Deep learning framework
scikit-learn==1.6.1 - Machine learning utilities
imblearn - Imbalanced data handling (SMOTE)
syft_flwr - SyftBox integration

Running the Example

Local Simulation

Run federated learning locally with simulated clients:

flwr run .

This will:

Simulate 2 supernodes (clients) locally
Run 2 federated learning rounds
Save model weights to ./weights/ directory

Configuration (pyproject.toml):

[tool.flwr.app.config]
num-server-rounds = 2        # Number of training rounds
partition-id = 0             # Client partition ID
num-partitions = 1           # Total number of partitions

[tool.flwr.federations.local-simulation.options]
num-supernodes = 2          # Number of simulated clients

Jupyter Notebooks

For interactive exploration, use the included notebooks:

Local Setup

The local/ directory contains notebooks for running on a local SyftBox network:

Start with local/do1.ipynb (Data Owner 1)
Then run local/do2.ipynb (Data Owner 2)
Finally open local/ds.ipynb (Data Scientist)

Switch between notebooks as indicated to simulate the complete workflow.

Distributed Setup

The distributed/ directory contains the same workflow but for real distributed deployment where each party runs on different machines using the SyftBox client.

Client Implementation

The Flower client handles local training and evaluation:

class FlowerClient(NumPyClient):
    def __init__(self, net, trainloader, testloader):
        self.net = net
        self.trainloader = trainloader
        self.testloader = testloader

    def fit(self, parameters, config):
        set_weights(self.net, parameters)
        train(self.net, self.trainloader)
        return get_weights(self.net), len(self.trainloader), {}

    def evaluate(self, parameters, config):
        set_weights(self.net, parameters)
        loss, accuracy = evaluate(self.net, self.testloader)
        return loss, len(self.testloader), {"accuracy": accuracy}

Server Strategy

The server uses FedAvgWithModelSaving strategy:

strategy = FedAvgWithModelSaving(
    save_path=save_path,
    fraction_fit=1.0,
    fraction_evaluate=1.0,
    min_available_clients=1,
    min_fit_clients=1,
    min_evaluate_clients=1,
    initial_parameters=params,
    evaluate_metrics_aggregation_fn=weighted_average,
)

Aggregation Function:

def weighted_average(metrics):
    accuracies = [num_examples * m["accuracy"] for num_examples, m in metrics]
    examples = [num_examples for num_examples, _ in metrics]
    return {"accuracy": sum(accuracies) / sum(examples)}

Fault Tolerance

The system handles client failures during federated learning:

Default Configuration (50% failure tolerance)

Total Clients: 2
Minimum Required: 1
Failure Tolerance: Can continue with 1 out of 2 clients

Configuration Parameters

[tool.flwr.app.config]
min-available-clients = 1   # Start with at least 1 client
min-fit-clients = 1          # Train with at least 1 client
min-evaluate-clients = 1     # Evaluate with at least 1 client
fraction-fit = 0.5           # Sample 50% of clients per round
fraction-evaluate = 0.5      # Sample 50% of clients for evaluation

Using fraction-fit < 1.0 ensures the server doesn’t get stuck waiting for failed clients that were already sampled in a round.

Training Details

Optimizer: Adam (lr=0.001, weight_decay=0.0005)
Loss Function: Binary Cross-Entropy (BCELoss)
Batch Size: 10 (training), full dataset (testing)
Local Epochs: 1 per round (configurable)
Device Support: CUDA, MPS (Apple Silicon), XPU, or CPU

Project Structure

fl-diabetes-prediction/
├── fl_diabetes_prediction/
│   ├── __init__.py
│   ├── task.py           # Model, data loading, training logic
│   ├── client_app.py     # Flower client implementation
│   ├── server_app.py     # Flower server implementation
│   └── main.py           # SyftBox entry point
├── local/                # Local simulation notebooks
│   ├── do1.ipynb
│   ├── do2.ipynb
│   └── ds.ipynb
├── distributed/          # Distributed deployment notebooks
├── distributed-gdrive/   # Google Colab notebooks
├── pyproject.toml        # Project configuration
├── weights/              # Saved model checkpoints
└── README.md

Example Projects

Deployment Options

Diabetes Prediction

Overview

Key Features

Architecture

Model Structure

Dataset

Setup

Clone the Project

Install Dependencies

Running the Example

Local Simulation

Jupyter Notebooks

Local Setup

Distributed Setup

Client Implementation

Server Strategy

Fault Tolerance

Default Configuration (50% failure tolerance)

Configuration Parameters

Training Details

Project Structure

Deployment Options

Local Simulation

Google Colab

SyftBox Network

Next Steps

Try Federated Analytics

Explore FedRAG

Resources

Build docs developers (and LLMs) love

Example Projects

Deployment Options

​Overview

​Key Features

​Architecture

​Model Structure

​Dataset

​Setup

​Clone the Project

​Install Dependencies

​Running the Example

​Local Simulation

​Jupyter Notebooks

​Local Setup

​Distributed Setup

​Client Implementation

​Server Strategy

​Fault Tolerance

​Default Configuration (50% failure tolerance)

​Configuration Parameters

​Training Details

​Project Structure

​Deployment Options

Local Simulation

Google Colab

SyftBox Network

​Next Steps

Try Federated Analytics

Explore FedRAG

​Resources

Build docs developers (and LLMs) love

Overview

Key Features

Architecture

Model Structure

Dataset

Setup

Clone the Project

Install Dependencies

Running the Example

Local Simulation

Jupyter Notebooks

Local Setup

Distributed Setup

Client Implementation

Server Strategy

Fault Tolerance

Default Configuration (50% failure tolerance)

Configuration Parameters

Training Details

Project Structure

Deployment Options

Next Steps

Resources