Data Owner Guide
This guide covers everything you need to know as a data owner using Syft Client to securely share datasets and manage computational jobs.Overview
As a data owner, you’ll use Syft Client to:- Host private datasets with mock data previews
- Approve or reject peer connection requests
- Review and approve computational jobs
- Execute jobs on your private data
- Share results with data scientists
Getting Started
Login
Uselogin_do() to authenticate as a data owner:
The
login_do() function configures your client as a data owner with additional permissions for dataset management and job execution.Login Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
email | str | None | None | Your email address. Auto-detected in Colab. |
sync | bool | True | Sync with Google Drive on login |
load_peers | bool | True | Load peer connections on login |
token_path | str | Path | None | None | Path to OAuth token file (Jupyter only) |
syft_client/sync/login.py:56-89
Managing Peer Connections
View Peer Requests
Approve Peer Requests
Source:syft_client/sync/syftbox_manager.py:765-808
When you approve a peer, they automatically gain access to:
- Datasets you’ve shared with “any” permission
- The ability to submit jobs to your datasite
Creating and Managing Datasets
Create a Dataset
Dataset Permission Models
Share with Specific Users
Share with All Approved Peers
syft_client/sync/syftbox_manager.py:920-1005
Upload Private Data Separately
For sensitive datasets, you can upload private data to a separate owner-only collection:tests/unit/test_dataset_upload_private.py
Share Existing Datasets
Update permissions on existing datasets:syft_client/sync/syftbox_manager.py:1047-1099
Delete Datasets
List Your Datasets
Managing Jobs
View Submitted Jobs
Review and Approve Jobs
Execute Approved Jobs
Execution Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
stream_output | bool | True | Stream output in real-time vs. capture at end |
timeout | int | None | 300 | Timeout in seconds per job |
force_execution | bool | False | Execute jobs even from incompatible client versions |
share_outputs_with_submitter | bool | False | Grant submitter read access to outputs |
share_logs_with_submitter | bool | False | Grant submitter read access to logs |
syft_client/sync/syftbox_manager.py:823-890
Version Compatibility
By default, jobs from data scientists with incompatible Syft Client versions are skipped:Advanced Job Management
Job Execution Flow
When you runprocess_approved_jobs(), the following happens:
- Sync - Pull latest job submissions from Google Drive
- Version Check - Verify submitter’s client version is compatible
- Environment Setup - Create isolated virtual environment
- Dependency Installation - Install required packages
- Code Execution - Run the job script
- Result Storage - Save outputs to
outputs/folder - Sync Results - Upload results to Google Drive (if sharing enabled)
Share Job Results Manually
You can also share results after execution:tests/unit/test_sync_manager.py:545-558
Job Directory Structure
Executed jobs create the following structure:Performance Optimization
Checkpoints
Checkpoints speed up sync operations by creating snapshots of your datasite state:Checkpoints are automatically created every 50 events during
sync() operations. This significantly reduces initial sync time for new peers.syft_client/sync/syftbox_manager.py:1246-1294
Disable Auto-Sync
For better performance when making multiple API calls:Best Practices
Dataset Security
- Always use mock data - Never include real private data in mock files
- Review job code carefully - Ensure jobs only access authorized datasets
- Use specific permissions - Prefer explicit user lists over “any”
- Enable private upload - Use
upload_private=Truefor highly sensitive data - Monitor access - Regularly review approved peers and revoke access when needed
Job Review Checklist
Before approving a job, verify:- Code only accesses datasets the submitter has permission for
- No attempts to access network resources (if prohibited)
- No attempts to access system files outside the job directory
- Dependencies are from trusted sources
- Output volume is reasonable (won’t fill disk)
- Execution time is acceptable (set appropriate timeout)
Example Job Review Code
Automated Workflows
Auto-Approve Trusted Peers
For trusted collaborators, you can set up auto-approval:scripts/auto_approve_peers_and_share.py
Cleanup and Maintenance
Delete Your Syftbox
To completely remove all Syft data:syft_client/sync/syftbox_manager.py:1170-1240
Environment Variables
| Variable | Default | Description |
|---|---|---|
PRE_SYNC | "true" | Auto-sync before accessing datasets/jobs/peers |
SYFTCLIENT_TOKEN_PATH | None | Default token path for authentication |
SYFTCLIENT_DEV_MODE | False | Enable development mode features |
SYFT_DEFAULT_JOB_TIMEOUT_SECONDS | 300 | Default job execution timeout |
Common Issues
Peer approval not visible to data scientist
Ensure both parties sync:Job execution fails with missing dependencies
Check that all dependencies are specified in the job submission:“Version unknown” warnings when processing jobs
The data scientist’s client version couldn’t be determined. Either:- Have them upgrade to a newer version of syft-client
- Use
force_execution=Trueto bypass version checks (use with caution)
Next Steps
Authentication Setup
Set up OAuth tokens for Jupyter environments
Data Scientist Guide
Understand the data scientist workflow
Notebooks Guide
Learn notebook-specific workflows
API Reference
Explore the full API documentation