Run real federated learning across distributed parties without installing anything locally. Each participant uses a Google Colab notebook—making it perfect for demos, education, and rapid prototyping.
Overview
Google Colab deployment enables true distributed federated learning with:
Zero Installation : No local Python, Docker, or package management needed
Multi-Party Collaboration : Each participant runs their own Colab notebook
Real Distribution : Notebooks run on separate Google Cloud instances
Privacy Preservation : Data stays in each participant’s Colab runtime
Easy Sharing : Share notebooks via Google Drive links
The Parties
In this federated learning flow, there are three key roles:
Data Owner 1 (DO1) : Holds private data partition 0
Data Owner 2 (DO2) : Holds private data partition 1
Data Scientist (DS) : Coordinates training, submits jobs, aggregates results
Each party runs in a separate Google Colab notebook. You can:
Use three different Google accounts (your own setup)
Invite two friends to join (collaborative demo)
Use the same account but run notebooks separately (educational demo)
Raw data never leaves the data owner’s Colab environment—only model updates are shared.
Prerequisites
Three Google accounts (one for each party), OR
Two friends willing to collaborate, OR
One account running three notebooks separately
Access to Google Colab (free tier works fine)
Quick Start
Get the Colab notebooks:
git clone https://github.com/OpenMined/syft-flwr.git
cd syft-flwr/notebooks/fl-diabetes-prediction/distributed-gdrive
The directory contains:
do1.ipynb - Data Owner 1 notebook
do2.ipynb - Data Owner 2 notebook
ds.ipynb - Data Scientist notebook
README.md - Detailed instructions
Step-by-Step Workflow
Step 1: Set Up Data Owner 1
Person 1 opens do1.ipynb in Google Colab.
Install Dependencies
# Cell 1: Install syft-flwr
! uv pip install - q "git+https://github.com/OpenMined/syft-flwr.git@main"
Login as Data Owner
# Cell 2: Initialize data owner client
import syft_client as sc
import syft_flwr
print ( f " { sc. __version__ = } " )
print ( f " { syft_flwr. __version__ = } " )
do_email = input ( "Enter the Data Owner's email: " ) # e.g., [email protected]
do_client = sc.login_do( email = do_email)
Download and Register Dataset
# Cell 3: Download PIMA diabetes dataset
from pathlib import Path
from huggingface_hub import snapshot_download
DATASET_DIR = Path( "./dataset/" ).expanduser().absolute()
if not DATASET_DIR .exists():
snapshot_download(
repo_id = "khoaguin/pima-indians-diabetes-database-partitions" ,
repo_type = "dataset" ,
local_dir = DATASET_DIR ,
)
# Cell 4: Register dataset with Syft
partition_number = 0 # DO1 uses partition 0
DATASET_PATH = DATASET_DIR / f "pima-indians-diabetes-database- { partition_number } "
do_client.create_dataset(
name = "pima-indians-diabetes-database" ,
mock_path = DATASET_PATH / "mock" , # Public sample data
private_path = DATASET_PATH / "private" , # Real private data
summary = "PIMA Indians Diabetes dataset - Partition 0" ,
readme_path = DATASET_PATH / "README.md" ,
tags = [ "healthcare" , "diabetes" ],
sync = True ,
)
do_client.datasets.get_all()
Key Concept : The mock_path contains synthetic/sample data that data scientists can explore. The private_path contains the real data that never leaves this environment.
Keep this Colab notebook running. Data Owner 1 will need to approve jobs later in the workflow.
Step 2: Set Up Data Owner 2
Person 2 opens do2.ipynb in Google Colab.
Repeat the same steps as DO1, but change the partition number:
# Cell 4: Use partition 1 instead of 0
partition_number = 1 # DO2 uses partition 1
Everything else stays the same. Now you have two data owners, each holding a different slice of the diabetes dataset.
Step 3: Set Up Data Scientist
Person 3 opens ds.ipynb in Google Colab.
Install and Login
# Cell 1: Install dependencies
! uv pip install - q "git+https://github.com/OpenMined/syft-flwr.git@main"
# Cell 2: Login as data scientist
import syft_client as sc
import syft_flwr
ds_email = input ( "Enter the Data Scientist's email: " ) # e.g., [email protected]
ds_client = sc.login_ds( email = ds_email)
Add Peers (Data Owners)
# Cell 3: Connect to data owners
do1_email = input ( "Enter the First Data Owner's email: " ) # Get from Person 1
ds_client.add_peer(do1_email)
do2_email = input ( "Enter the Second Data Owner's email: " ) # Get from Person 2
ds_client.add_peer(do2_email)
# Verify peers are added
ds_client.peers
Explore Available Datasets
# Cell 4: Check DO1's datasets
do1_datasets = ds_client.datasets.get_all( datasite = do1_email)
do1_datasets[ 0 ].describe()
# Cell 5: Check DO2's datasets
do2_datasets = ds_client.datasets.get_all( datasite = do2_email)
do2_datasets[ 0 ].describe()
# Cell 6: Get mock dataset URLs for local testing
mock_dataset_urls = [do1_datasets[ 0 ].mock_url, do2_datasets[ 0 ].mock_url]
print (mock_dataset_urls)
Step 4: Propose the FL Project
Clone FL Project
The FL project uses Flower for federated learning orchestration:
# Cell 7: Download FL project code
from pathlib import Path
! mkdir - p / content / fl - diabetes - prediction
! curl - sL https: // github.com / khoaguin / fl - diabetes - prediction / archive / refs / heads / main.tar.gz | tar - xz -- strip - components = 1 - C / content / fl - diabetes - prediction
SYFT_FLWR_PROJECT_PATH = Path( "/content/fl-diabetes-prediction" )
print ( f "Project at: { SYFT_FLWR_PROJECT_PATH } " )
Bootstrap the Project
Configure the project with participant emails:
# Cell 8: Bootstrap with aggregator and datasites
import syft_flwr
! rm - rf { SYFT_FLWR_PROJECT_PATH / "main.py" }
do_emails = [peer.email for peer in ds_client.peers]
syft_flwr.bootstrap(
SYFT_FLWR_PROJECT_PATH ,
aggregator = ds_email,
datasites = do_emails
)
print ( "Bootstrapped project successfully!" )
Submit Jobs to Data Owners
Send the training code to each data owner for approval:
# Cell 9: Submit jobs
! rm - rf { SYFT_FLWR_PROJECT_PATH / "fl_diabetes_prediction" / "__pycache__" }
job_name = "fl-diabetes-training"
# Submit to DO1
ds_client.submit_python_job(
user = do1_email,
code_path = str ( SYFT_FLWR_PROJECT_PATH ),
job_name = job_name,
)
# Submit to DO2
ds_client.submit_python_job(
user = do2_email,
code_path = str ( SYFT_FLWR_PROJECT_PATH ),
job_name = job_name,
)
ds_client.jobs
The job contains the training code. Data owners can inspect it before approving execution on their private data.
Step 5: Data Owners Approve Jobs
Person 1 (in do1.ipynb) and Person 2 (in do2.ipynb):
# Cell 5: Check incoming jobs
do_client.jobs
# Cell 6: Review and approve the job
do_client.jobs[ 0 ].approve()
do_client.jobs
# Cell 7: Process approved jobs (runs training on private data)
do_client.process_approved_jobs()
# Cell 8: Check job status
do_client.jobs
Both data owners must approve and process jobs before proceeding.
Step 6: Run Federated Training
Person 3 (in ds.ipynb) runs the aggregation:
# Cell 10: Install training dependencies
! uv pip install \
"flwr-datasets>=0.5.0" \
"imblearn>=0.0" \
"loguru>=0.7.3" \
"pandas>=2.3.0" \
"ipywidgets>=8.1.7" \
"scikit-learn==1.7.1" \
"torch>=2.8.0" \
"ray==2.31.0"
# Cell 11: Run aggregation server
ds_email = ds_client.email
syftbox_folder = f "/content/SyftBox_ { ds_email } "
! SYFTBOX_EMAIL = " {ds_email} " SYFTBOX_FOLDER = " {syftbox_folder} " \
uv run { str ( SYFT_FLWR_PROJECT_PATH / "main.py" )}
# Cell 12: Check final job status
ds_client.jobs
Step 7: Clean Up
When done, clean up SyftBox resources:
# In DS notebook
ds_client.delete_syftbox()
# In DO1 and DO2 notebooks
do_client.delete_syftbox()
What Just Happened?
You successfully trained a diabetes prediction model using federated learning:
Two data owners each held a private partition in their Colab runtime
A data scientist coordinated training without seeing raw data
Model updates were aggregated using Flower framework
Privacy was preserved —raw data never left the data owner’s Colab environment
Real distribution —notebooks ran on separate Google Cloud instances
Architecture
┌─────────────────────┐
│ Data Scientist │
│ (Colab: ds.ipynb) │
│ │
│ - Submits jobs │
│ - Runs aggregator │
│ - Collects results │
└──────────┬──────────┘
│
│ SyftBox Network
│ (Google Drive sync)
│
┌──────┴──────┐
│ │
┌───▼──────────┐ │ ┌──────────────┐
│ Data Owner 1 │ └──│ Data Owner 2 │
│ (Colab) │ │ (Colab) │
│ │ │ │
│ - Partition 0│ │ - Partition 1│
│ - Approves │ │ - Approves │
│ - Trains │ │ - Trains │
└──────────────┘ └──────────────┘
Communication Flow
Job Submission : DS → (via SyftBox) → DO1, DO2
Job Approval : DO1, DO2 → (via SyftBox) → DS
Training Rounds :
DS sends global model → DO1, DO2
DO1, DO2 train locally on private data
DO1, DO2 send model updates → DS
DS aggregates updates
Results : DS receives final model
Advantages Over Local Simulation
Feature Local Simulation Google Colab Distribution Simulated (same machine) Real (separate cloud instances) Network Loopback Internet Installation Local Python setup Zero installation Collaboration Solo Multi-party Realism Testing Production-like Privacy Simulated isolation Real isolation
Use Cases
1. Educational Demos
Teach federated learning concepts:
Students use personal Google accounts
No complex setup or IT requirements
Visual demonstration of distributed training
2. Hackathons
Rapid prototyping for FL projects:
Form teams with different roles (DOs and DS)
Iterate quickly without infrastructure
Share notebooks via Google Drive
3. Research Collaboration
Multi-institution research:
Each institution uses their Google Workspace
Real privacy boundaries
Reproducible experiments via shared notebooks
4. Client Demos
Showcase federated learning to stakeholders:
Non-technical audience can participate
Visual proof of privacy preservation
No local installation for clients
Limitations
Google Colab has usage limits:
Free tier : Limited GPU hours, may disconnect after inactivity
Compute : Lower performance than dedicated hardware
Storage : Temporary runtime storage (lost when disconnected)
Network : Subject to Google Cloud network policies
Mitigations
Colab Pro : Upgrade for longer runtimes and better GPUs
Save Checkpoints : Regularly save model weights to Google Drive
Keep Notebooks Active : Prevent disconnection by running periodic cells
Next Steps
Try Another Example Run federated analytics on Google Colab.
Local Simulation Test on your local machine first.
Production Deployment Move to SyftBox network for real production use.
API Reference Explore the Syft-Flwr API.
Resources