Overview
This quickstart shows you how to run a complete federated learning workflow directly from Google Colab—no local setup required. You’ll train a diabetes prediction model across two data owners using the PIMA Indians Diabetes dataset, all while keeping private data secure.
You can complete this tutorial with three Google accounts (one for each party), or invite two friends for a real collaborative experience.
What you’ll build
A federated learning system with three participants:
Data Owner 1 (DO1) - Holds partition 0 of the diabetes dataset
Data Owner 2 (DO2) - Holds partition 1 of the diabetes dataset
Data Scientist (DS) - Coordinates training and aggregates model updates
Raw data never leaves each data owner’s Colab environment—only model updates are shared.
Prerequisites
Three Google accounts (or two friends with Google accounts)
Access to Google Colab
15-20 minutes
That’s it! No Python installation, no complex setup.
Step 1: Set up data owners
Data Owner 1 setup
Open a new Colab notebook and run:
# Install syft-flwr
! uv pip install - q "git+https://github.com/OpenMined/syft-flwr.git@main"
# Login as Data Owner 1
import syft_client as sc
import syft_flwr
do_email = input ( "Enter Data Owner 1's email: " )
do_client = sc.login_do( email = do_email)
Register the dataset
from pathlib import Path
from huggingface_hub import snapshot_download
# Download dataset from HuggingFace
DATASET_DIR = Path( "./dataset/" ).expanduser().absolute()
if not DATASET_DIR .exists():
snapshot_download(
repo_id = "khoaguin/pima-indians-diabetes-database-partitions" ,
repo_type = "dataset" ,
local_dir = DATASET_DIR ,
)
# Create Syft dataset with mock and private paths
partition_number = 0 # DO1 uses partition 0
DATASET_PATH = DATASET_DIR / f "pima-indians-diabetes-database- { partition_number } "
do_client.create_dataset(
name = "pima-indians-diabetes-database" ,
mock_path = DATASET_PATH / "mock" ,
private_path = DATASET_PATH / "private" ,
summary = "PIMA Indians Diabetes dataset - Partition 0" ,
readme_path = DATASET_PATH / "README.md" ,
tags = [ "healthcare" , "diabetes" ],
sync = True ,
)
# Verify dataset creation
do_client.datasets.get_all()
The mock_path contains synthetic/sample data for code development. The private_path contains real data that never leaves this environment.
Data Owner 2 setup
Repeat the same steps in a new Colab notebook , but change the partition number:
partition_number = 1 # DO2 uses partition 1
Everything else stays identical. Now you have two data owners, each holding a different slice of the dataset.
Step 2: Set up data scientist
In a third Colab notebook , set up the Data Scientist role.
Login and add peers
! uv pip install - q "git+https://github.com/OpenMined/syft-flwr.git@main"
import syft_client as sc
import syft_flwr
ds_email = input ( "Enter Data Scientist's email: " )
ds_client = sc.login_ds( email = ds_email)
# Add both data owners as peers
do1_email = input ( "Enter Data Owner 1's email: " )
ds_client.add_peer(do1_email)
do2_email = input ( "Enter Data Owner 2's email: " )
ds_client.add_peer(do2_email)
# Verify peers were added
ds_client.peers
Explore available datasets
# Check DO1's datasets
do1_datasets = ds_client.datasets.get_all( datasite = do1_email)
do1_datasets[ 0 ].describe()
# Check DO2's datasets
do2_datasets = ds_client.datasets.get_all( datasite = do2_email)
do2_datasets[ 0 ].describe()
# Get mock dataset URLs for testing
mock_dataset_urls = [do1_datasets[ 0 ].mock_url, do2_datasets[ 0 ].mock_url]
mock_dataset_urls
Step 3: Prepare the FL project
Clone the Flower project
The FL project is built using Flower , defining model architecture, training logic, and communication. Syft-Flwr handles job submission and governance on top.
from pathlib import Path
! mkdir - p / content / fl - diabetes - prediction
! curl - sL https: // github.com / khoaguin / fl - diabetes - prediction / archive / refs / heads / main.tar.gz | tar - xz -- strip - components = 1 - C / content / fl - diabetes - prediction
SYFT_FLWR_PROJECT_PATH = Path( "/content/fl-diabetes-prediction" )
print ( f "Project at: { SYFT_FLWR_PROJECT_PATH } " )
Bootstrap the project
Configure the project with participating datasites and generate the main.py entry point:
import syft_flwr
! rm - rf { SYFT_FLWR_PROJECT_PATH / "main.py" }
do_emails = [peer.email for peer in ds_client.peers]
syft_flwr.bootstrap(
SYFT_FLWR_PROJECT_PATH ,
aggregator = ds_email,
datasites = do_emails
)
print ( "Bootstrapped project successfully!" )
bootstrap() auto-detects the transport layer. In Colab it uses P2P (Google Drive), locally it uses SyftBox.
Submit jobs to data owners
! rm - rf { SYFT_FLWR_PROJECT_PATH / "fl_diabetes_prediction" / "__pycache__" }
job_name = "fl-diabetes-training"
# Submit to DO1
ds_client.submit_python_job(
user = do1_email,
code_path = str ( SYFT_FLWR_PROJECT_PATH ),
job_name = job_name,
)
# Submit to DO2
ds_client.submit_python_job(
user = do2_email,
code_path = str ( SYFT_FLWR_PROJECT_PATH ),
job_name = job_name,
)
# Verify job submission
ds_client.jobs
Step 4: Data owners approve and run jobs
Back in each Data Owner’s notebook (DO1 and DO2):
# Check for incoming jobs
do_client.jobs
# Review and approve the job
do_client.jobs[ 0 ].approve()
# Process approved jobs (runs client-side training)
do_client.process_approved_jobs()
# Check job status
do_client.jobs
Data owners can inspect the submitted code before approving. This is a critical governance feature.
Step 5: Run federated training
Back in the Data Scientist notebook , install dependencies and run the aggregator:
# Install Flower and ML dependencies
! uv pip install \
"flwr-datasets>=0.5.0" \
"imblearn>=0.0" \
"loguru>=0.7.3" \
"pandas>=2.3.0" \
"ipywidgets>=8.1.7" \
"scikit-learn==1.7.1" \
"torch>=2.8.0" \
"ray==2.31.0"
# Start the aggregation server
ds_email = ds_client.email
syftbox_folder = f "/content/SyftBox_ { ds_email } "
! SYFTBOX_EMAIL = " {ds_email} " SYFTBOX_FOLDER = " {syftbox_folder} " \
uv run { str ( SYFT_FLWR_PROJECT_PATH / "main.py" )}
# Check final job status
ds_client.jobs
The aggregator coordinates training rounds, receiving model updates from each data owner and combining them using Federated Averaging (FedAvg).
Step 6: Clean up
When you’re done, clean up resources in each notebook:
Data Scientist
Data Owner 1
Data Owner 2
ds_client.delete_syftbox()
What just happened?
You successfully trained a diabetes prediction model using federated learning:
Two data owners each held a private partition of the dataset
A data scientist coordinated training without seeing raw data
Model updates were aggregated using the Flower framework
Privacy was preserved—raw data never left the data owner’s environment
This is the core promise of federated learning: collaborative machine learning without sharing sensitive data.
Understanding the code
The Flower project follows a standard structure:
Client app
src/syft_flwr/fl_diabetes_prediction/client_app.py
from flwr.client import ClientApp, NumPyClient
from flwr.common import Context
class FlowerClient ( NumPyClient ):
def __init__ ( self , net , trainloader , testloader ):
self .net = net
self .trainloader = trainloader
self .testloader = testloader
def fit ( self , parameters , config ):
set_weights( self .net, parameters)
train( self .net, self .trainloader)
return get_weights( self .net), len ( self .trainloader), {}
def evaluate ( self , parameters , config ):
set_weights( self .net, parameters)
loss, accuracy = evaluate( self .net, self .testloader)
return loss, len ( self .testloader), { "accuracy" : accuracy}
app = ClientApp( client_fn = client_fn)
Server app
src/syft_flwr/fl_diabetes_prediction/server_app.py
from flwr.server import ServerApp, ServerConfig
from syft_flwr.strategy import FedAvgWithModelSaving
def server_fn ( context : Context):
strategy = FedAvgWithModelSaving(
save_path = output_dir / "weights" ,
fraction_fit = 1.0 ,
min_available_clients = 1 ,
initial_parameters = params,
evaluate_metrics_aggregation_fn = weighted_average,
)
config = ServerConfig( num_rounds = 5 )
return ServerAppComponents( config = config, strategy = strategy)
app = ServerApp( server_fn = server_fn)
Syft-Flwr automatically handles:
Loading datasets from private paths via load_syftbox_dataset()
Routing model updates through file sync
Managing job approval workflows
Next steps
Installation Set up Syft-Flwr for local development
Development guide Learn how to build custom FL projects
API reference Explore the complete API
Examples Explore more FL examples
Get help