Google Colab Deployment

Run real federated learning across distributed parties without installing anything locally. Each participant uses a Google Colab notebook—making it perfect for demos, education, and rapid prototyping.

Overview

Google Colab deployment enables true distributed federated learning with:

Zero Installation: No local Python, Docker, or package management needed
Multi-Party Collaboration: Each participant runs their own Colab notebook
Real Distribution: Notebooks run on separate Google Cloud instances
Privacy Preservation: Data stays in each participant’s Colab runtime
Easy Sharing: Share notebooks via Google Drive links

The Parties

In this federated learning flow, there are three key roles:

Data Owner 1 (DO1): Holds private data partition 0
Data Owner 2 (DO2): Holds private data partition 1
Data Scientist (DS): Coordinates training, submits jobs, aggregates results

Each party runs in a separate Google Colab notebook. You can:

Use three different Google accounts (your own setup)
Invite two friends to join (collaborative demo)
Use the same account but run notebooks separately (educational demo)

Raw data never leaves the data owner’s Colab environment—only model updates are shared.

Prerequisites

Three Google accounts (one for each party), OR
Two friends willing to collaborate, OR
One account running three notebooks separately
Access to Google Colab (free tier works fine)

Quick Start

Get the Colab notebooks:

git clone https://github.com/OpenMined/syft-flwr.git
cd syft-flwr/notebooks/fl-diabetes-prediction/distributed-gdrive

The directory contains:

do1.ipynb - Data Owner 1 notebook
do2.ipynb - Data Owner 2 notebook
ds.ipynb - Data Scientist notebook
README.md - Detailed instructions

Step-by-Step Workflow

Step 1: Set Up Data Owner 1

Person 1 opens do1.ipynb in Google Colab.

Install Dependencies

# Cell 1: Install syft-flwr
!uv pip install -q "git+https://github.com/OpenMined/syft-flwr.git@main"

# Cell 2: Initialize data owner client
import syft_client as sc
import syft_flwr

print(f"{sc.__version__ = }")
print(f"{syft_flwr.__version__ = }")

do_email = input("Enter the Data Owner's email: ")  # e.g., [email protected]
do_client = sc.login_do(email=do_email)

Download and Register Dataset

# Cell 3: Download PIMA diabetes dataset
from pathlib import Path
from huggingface_hub import snapshot_download

DATASET_DIR = Path("./dataset/").expanduser().absolute()

if not DATASET_DIR.exists():
    snapshot_download(
        repo_id="khoaguin/pima-indians-diabetes-database-partitions",
        repo_type="dataset",
        local_dir=DATASET_DIR,
    )

# Cell 4: Register dataset with Syft
partition_number = 0  # DO1 uses partition 0
DATASET_PATH = DATASET_DIR / f"pima-indians-diabetes-database-{partition_number}"

do_client.create_dataset(
    name="pima-indians-diabetes-database",
    mock_path=DATASET_PATH / "mock",        # Public sample data
    private_path=DATASET_PATH / "private",  # Real private data
    summary="PIMA Indians Diabetes dataset - Partition 0",
    readme_path=DATASET_PATH / "README.md",
    tags=["healthcare", "diabetes"],
    sync=True,
)

do_client.datasets.get_all()

Key Concept: The mock_path contains synthetic/sample data that data scientists can explore. The private_path contains the real data that never leaves this environment.

Keep this Colab notebook running. Data Owner 1 will need to approve jobs later in the workflow.

Step 2: Set Up Data Owner 2

Person 2 opens do2.ipynb in Google Colab. Repeat the same steps as DO1, but change the partition number:

# Cell 4: Use partition 1 instead of 0
partition_number = 1  # DO2 uses partition 1

Everything else stays the same. Now you have two data owners, each holding a different slice of the diabetes dataset.

Step 3: Set Up Data Scientist

Person 3 opens ds.ipynb in Google Colab.

# Cell 1: Install dependencies
!uv pip install -q "git+https://github.com/OpenMined/syft-flwr.git@main"

# Cell 2: Login as data scientist
import syft_client as sc
import syft_flwr

ds_email = input("Enter the Data Scientist's email: ")  # e.g., [email protected]
ds_client = sc.login_ds(email=ds_email)

Add Peers (Data Owners)

# Cell 3: Connect to data owners
do1_email = input("Enter the First Data Owner's email: ")  # Get from Person 1
ds_client.add_peer(do1_email)

do2_email = input("Enter the Second Data Owner's email: ")  # Get from Person 2
ds_client.add_peer(do2_email)

# Verify peers are added
ds_client.peers

Explore Available Datasets

# Cell 4: Check DO1's datasets
do1_datasets = ds_client.datasets.get_all(datasite=do1_email)
do1_datasets[0].describe()

# Cell 5: Check DO2's datasets
do2_datasets = ds_client.datasets.get_all(datasite=do2_email)
do2_datasets[0].describe()

# Cell 6: Get mock dataset URLs for local testing
mock_dataset_urls = [do1_datasets[0].mock_url, do2_datasets[0].mock_url]
print(mock_dataset_urls)

Step 4: Propose the FL Project

Clone FL Project

The FL project uses Flower for federated learning orchestration:

# Cell 7: Download FL project code
from pathlib import Path

!mkdir -p /content/fl-diabetes-prediction
!curl -sL https://github.com/khoaguin/fl-diabetes-prediction/archive/refs/heads/main.tar.gz | tar -xz --strip-components=1 -C /content/fl-diabetes-prediction

SYFT_FLWR_PROJECT_PATH = Path("/content/fl-diabetes-prediction")
print(f"Project at: {SYFT_FLWR_PROJECT_PATH}")

Bootstrap the Project

Configure the project with participant emails:

# Cell 8: Bootstrap with aggregator and datasites
import syft_flwr

!rm -rf {SYFT_FLWR_PROJECT_PATH / "main.py"}

do_emails = [peer.email for peer in ds_client.peers]
syft_flwr.bootstrap(
    SYFT_FLWR_PROJECT_PATH,
    aggregator=ds_email,
    datasites=do_emails
)
print("Bootstrapped project successfully!")

Submit Jobs to Data Owners

Send the training code to each data owner for approval:

# Cell 9: Submit jobs
!rm -rf {SYFT_FLWR_PROJECT_PATH / "fl_diabetes_prediction" / "__pycache__"}

job_name = "fl-diabetes-training"

# Submit to DO1
ds_client.submit_python_job(
    user=do1_email,
    code_path=str(SYFT_FLWR_PROJECT_PATH),
    job_name=job_name,
)

# Submit to DO2
ds_client.submit_python_job(
    user=do2_email,
    code_path=str(SYFT_FLWR_PROJECT_PATH),
    job_name=job_name,
)

ds_client.jobs

The job contains the training code. Data owners can inspect it before approving execution on their private data.

Step 5: Data Owners Approve Jobs

Person 1 (in do1.ipynb) and Person 2 (in do2.ipynb):

# Cell 5: Check incoming jobs
do_client.jobs

# Cell 6: Review and approve the job
do_client.jobs[0].approve()
do_client.jobs

# Cell 7: Process approved jobs (runs training on private data)
do_client.process_approved_jobs()

# Cell 8: Check job status
do_client.jobs

Both data owners must approve and process jobs before proceeding.

Step 6: Run Federated Training

Person 3 (in ds.ipynb) runs the aggregation:

# Cell 10: Install training dependencies
!uv pip install \
    "flwr-datasets>=0.5.0" \
    "imblearn>=0.0" \
    "loguru>=0.7.3" \
    "pandas>=2.3.0" \
    "ipywidgets>=8.1.7" \
    "scikit-learn==1.7.1" \
    "torch>=2.8.0" \
    "ray==2.31.0"

# Cell 11: Run aggregation server
ds_email = ds_client.email
syftbox_folder = f"/content/SyftBox_{ds_email}"

!SYFTBOX_EMAIL="{ds_email}" SYFTBOX_FOLDER="{syftbox_folder}" \
    uv run {str(SYFT_FLWR_PROJECT_PATH / "main.py")}

# Cell 12: Check final job status
ds_client.jobs

Step 7: Clean Up

When done, clean up SyftBox resources:

# In DS notebook
ds_client.delete_syftbox()

# In DO1 and DO2 notebooks
do_client.delete_syftbox()

What Just Happened?

You successfully trained a diabetes prediction model using federated learning:

Two data owners each held a private partition in their Colab runtime
A data scientist coordinated training without seeing raw data
Model updates were aggregated using Flower framework
Privacy was preserved—raw data never left the data owner’s Colab environment
Real distribution—notebooks ran on separate Google Cloud instances

Architecture

┌─────────────────────┐
│  Data Scientist     │
│  (Colab: ds.ipynb)  │
│                     │
│  - Submits jobs     │
│  - Runs aggregator  │
│  - Collects results │
└──────────┬──────────┘
           │
           │ SyftBox Network
           │ (Google Drive sync)
           │
    ┌──────┴──────┐
    │             │
┌───▼──────────┐  │  ┌──────────────┐
│ Data Owner 1 │  └──│ Data Owner 2 │
│ (Colab)      │     │ (Colab)      │
│              │     │              │
│ - Partition 0│     │ - Partition 1│
│ - Approves   │     │ - Approves   │
│ - Trains     │     │ - Trains     │
└──────────────┘     └──────────────┘

Communication Flow

Job Submission: DS → (via SyftBox) → DO1, DO2
Job Approval: DO1, DO2 → (via SyftBox) → DS
Training Rounds:
- DS sends global model → DO1, DO2
- DO1, DO2 train locally on private data
- DO1, DO2 send model updates → DS
- DS aggregates updates
Results: DS receives final model

Advantages Over Local Simulation

Feature	Local Simulation	Google Colab
Distribution	Simulated (same machine)	Real (separate cloud instances)
Network	Loopback	Internet
Installation	Local Python setup	Zero installation
Collaboration	Solo	Multi-party
Realism	Testing	Production-like
Privacy	Simulated isolation	Real isolation

Use Cases

1. Educational Demos

Teach federated learning concepts:

Students use personal Google accounts
No complex setup or IT requirements
Visual demonstration of distributed training

2. Hackathons

Rapid prototyping for FL projects:

Form teams with different roles (DOs and DS)
Iterate quickly without infrastructure
Share notebooks via Google Drive

3. Research Collaboration

Multi-institution research:

Each institution uses their Google Workspace
Real privacy boundaries
Reproducible experiments via shared notebooks

4. Client Demos

Showcase federated learning to stakeholders:

Non-technical audience can participate
Visual proof of privacy preservation
No local installation for clients

Limitations

Google Colab has usage limits:

Free tier: Limited GPU hours, may disconnect after inactivity
Compute: Lower performance than dedicated hardware
Storage: Temporary runtime storage (lost when disconnected)
Network: Subject to Google Cloud network policies

Mitigations

Colab Pro: Upgrade for longer runtimes and better GPUs
Save Checkpoints: Regularly save model weights to Google Drive
Keep Notebooks Active: Prevent disconnection by running periodic cells

Next Steps

Try Another Example

Run federated analytics on Google Colab.

Local Simulation

Test on your local machine first.

Production Deployment

Move to SyftBox network for real production use.

API Reference

Explore the Syft-Flwr API.

Resources

Google Colab Documentation
Syft-Client Documentation
Flower Framework
OpenMined Community
Join Slack - #community-federated-learning channel

Example Projects

Deployment Options

Google Colab Deployment

Overview

The Parties

Prerequisites

Quick Start

Step-by-Step Workflow

Step 1: Set Up Data Owner 1

Install Dependencies

Download and Register Dataset

Step 2: Set Up Data Owner 2

Step 3: Set Up Data Scientist

Add Peers (Data Owners)

Explore Available Datasets

Step 4: Propose the FL Project

Clone FL Project

Bootstrap the Project

Submit Jobs to Data Owners

Step 5: Data Owners Approve Jobs

Step 6: Run Federated Training

Step 7: Clean Up

What Just Happened?

Architecture

Communication Flow

Advantages Over Local Simulation

Use Cases

1. Educational Demos

2. Hackathons

3. Research Collaboration

4. Client Demos

Limitations

Mitigations

Next Steps

Try Another Example

Local Simulation

Production Deployment

API Reference

Resources

Build docs developers (and LLMs) love

Example Projects

Deployment Options

​Overview

​The Parties

​Prerequisites

​Quick Start

​Step-by-Step Workflow

​Step 1: Set Up Data Owner 1

​Install Dependencies

​Login as Data Owner

​Download and Register Dataset

​Step 2: Set Up Data Owner 2

​Step 3: Set Up Data Scientist

​Install and Login

​Add Peers (Data Owners)

​Explore Available Datasets

​Step 4: Propose the FL Project

​Clone FL Project

​Bootstrap the Project

​Submit Jobs to Data Owners

​Step 5: Data Owners Approve Jobs

​Step 6: Run Federated Training

​Step 7: Clean Up

​What Just Happened?

​Architecture

​Communication Flow

​Advantages Over Local Simulation

​Use Cases

​1. Educational Demos

​2. Hackathons

​3. Research Collaboration

​4. Client Demos

​Limitations

​Mitigations

​Next Steps

Try Another Example

Local Simulation

Production Deployment

API Reference

​Resources

Build docs developers (and LLMs) love

Overview

The Parties

Prerequisites

Quick Start

Step-by-Step Workflow

Step 1: Set Up Data Owner 1

Install Dependencies

Login as Data Owner

Download and Register Dataset

Step 2: Set Up Data Owner 2

Step 3: Set Up Data Scientist

Install and Login

Add Peers (Data Owners)

Explore Available Datasets

Step 4: Propose the FL Project

Clone FL Project

Bootstrap the Project

Submit Jobs to Data Owners

Step 5: Data Owners Approve Jobs

Step 6: Run Federated Training

Step 7: Clean Up

What Just Happened?

Architecture

Communication Flow

Advantages Over Local Simulation

Use Cases

1. Educational Demos

2. Hackathons

3. Research Collaboration

4. Client Demos

Limitations

Mitigations

Next Steps

Resources