Skip to main content
Syft Client - Peer-to-peer data science

Secure, peer-to-peer data science

Syft Client enables secure peer-to-peer data science collaboration through Google Workspace, Microsoft 365, and other trusted communication channels. Built for privacy-preserving machine learning and federated analytics, it lets data scientists collaborate without centralizing sensitive data.

Google Drive Sync

Automatically sync datasets, jobs, and results through Google Drive - no additional infrastructure needed.

Job Execution

Submit Python and bash jobs to remote peers. Execute computations on private data without direct access.

Dataset Sharing

Create and share datasets with fine-grained permissions. Separate mock and private data for privacy.

Peer Management

Approve peer requests, manage permissions, and collaborate with version compatibility checks.

Jupyter & Colab

First-class support for Jupyter notebooks and Google Colab - perfect for data science workflows.

Permission System

Declarative permission system for controlling file access, sharing, and collaboration.

How it works

Syft Client uses a data site architecture where each user has their own “datasite” - a local folder synced through Google Drive:
1

Login as Data Owner (DO) or Data Scientist (DS)

Data Owners host datasets and approve jobs. Data Scientists discover datasets and submit jobs for execution.
2

Sync via Google Drive

All changes are automatically synced through Google Drive. No central server required.
3

Submit and Execute Jobs

Data Scientists submit jobs that run on the Data Owner’s machine, with results automatically shared back.

Key features

Two roles for different workflows

import syft_client as sc

# Login as a Data Owner
client = sc.login_do(
    email="[email protected]",
    token_path="/path/to/token.json"
)

# Create and share a dataset
client.create_dataset(
    name="medical_data",
    users=["[email protected]"],
    mock_files=["mock_data.csv"],
    private_files=["private_data.csv"]
)

# Approve peer requests
client.approve_peer_request("[email protected]")

# Process approved jobs
client.process_approved_jobs()

Automatic synchronization

All file changes are automatically detected and synced through Google Drive:
# Sync explicitly when needed
client.sync()

# Access auto-syncs before returning data
peers = client.peers      # Auto-syncs before showing peers
jobs = client.jobs        # Auto-syncs before showing jobs  
datasets = client.datasets # Auto-syncs before showing datasets

# Disable auto-sync if needed
import os
os.environ["PRE_SYNC"] = "false"

Job submission and execution

Submit Python scripts or entire project folders:
# Submit a single Python file
client.submit_python_job(
    user="[email protected]",
    code_path="train_model.py",
    job_name="Model Training",
    dependencies=["scikit-learn==1.3.0", "pandas"]
)

# Submit a Python project folder
client.submit_python_job(
    user="[email protected]",
    code_path="./my_project",
    entrypoint="main.py",  # Auto-detected if not provided
    dependencies=["torch", "transformers"]
)

# Submit a bash script
client.submit_bash_job(
    user="[email protected]",
    script="#!/bin/bash\necho 'Hello from remote execution'",
    job_name="Test Job"
)
Jobs run in isolated virtual environments with uv for fast dependency installation.

What’s next?

Quickstart

Get up and running in 5 minutes with our quickstart guide

Installation

Detailed installation instructions for pip, uv, and optional dependencies

API Reference

Complete API documentation for all classes and methods

User Guides

Learn how to use Syft Client as a Data Scientist or Data Owner

Build docs developers (and LLMs) love