Syft uses a file-first permission system where access control is defined by syft.pub.yaml files placed throughout the datasite folder hierarchy.
Core Permission Principles
File-Permission-First Principle 2 : Access control is first and foremost described by file permissions. Job policies are secondary.
File First Principle 1 : State is files. Permissions are also files (syft.pub.yaml) synced between peers.
Single Gateway Principle 17 : Only one job queue per datasite. Data owners see everything entering/leaving.
Manual Review First Principle 7 : All jobs are manually reviewed unless a policy auto-approves them.
Following Principle 2 , there are only two permission systems: file permissions and job policies . Nothing else. No hidden access control.
Permission Files
Permissions are defined in syft.pub.yaml files:
syft.pub.yaml
Specific Access
# File: [email protected] /public/syft.pub.yaml
rules :
- pattern : "**/*" # Match all files in this folder
access :
read : [ "*" ] # Everyone can read
write : [ "[email protected] " ] # Only owner can write
terminal : false # Continue checking parent folders
Permission File Schema
syft_permissions/spec/ruleset.py
class RuleSet ( BaseModel ):
"""A collection of permission rules in a folder"""
rules: list[Rule] = [] # List of permission rules
terminal: bool = False # Stop checking parent folders?
path: str = "" # Path to this ruleset (runtime)
@ classmethod
def load ( cls , filepath : Path) -> "RuleSet" :
"""Load from syft.pub.yaml file"""
with open (filepath) as f:
data = yaml.safe_load(f) or {}
rs = cls .model_validate(data)
rs.path = str (filepath.parent)
return rs
class Rule ( BaseModel ):
"""A single permission rule"""
pattern: str # Glob pattern (e.g., "**/*.csv")
access: Access # Access levels
class Access ( BaseModel ):
"""Access control lists"""
admin: list[ str ] = [] # Full control
write: list[ str ] = [] # Can create/modify files
read: list[ str ] = [] # Can read files
Pattern Matching
Patterns use glob syntax:
**/* - All files recursively
rules :
- pattern : "**/*"
access :
read : [ "*" ]
Matches all files in folder and subfolders.
*.csv - CSV files in this folder
Matches only .csv files in the current folder (not subfolders).
data/** - Subfolder recursively
rules :
- pattern : "data/**"
access :
read : [ "*@research-team.com" ]
Matches all files under data/ subfolder.
results_*.json - Pattern matching
Matches files like results_2024.json, results_final.json, etc.
Access Levels
Can view file contents. Required to:
Download files from peer’s datasite
List file existence
See file in job outputs
Can create/modify files. Required to:
Submit jobs to peer’s job queue
Upload results to peer’s datasite
Modify existing files
Write permission implicitly grants read permission.
Full control. Can:
Do anything read/write can do
Modify permission files
Delete files
Admin permission implicitly grants read and write.
User Patterns
Access lists support several patterns:
Pattern Meaning Example "*"Everyone (any user) read: ["*"]"[email protected] "Specific user read: ["[email protected] "]"*@domain.com"Domain wildcard read: ["*@company.com"]"USER"Placeholder (owner) admin: ["USER"]
Public Access
Specific Users
Domain Wildcard
Permission Hierarchy
Permissions are inherited through the folder hierarchy:
[email protected] /
├── syft.pub.yaml # Root: admin = ["[email protected] "]
├── public/
│ ├── syft.pub.yaml # Override: read = ["*"], terminal = false
│ ├── data.csv # ✅ Everyone can read (from public/syft.pub.yaml)
│ └── restricted/
│ ├── syft.pub.yaml # Override: read = ["[email protected] "], terminal = true
│ └── secret.csv # ✅ Only Bob can read (terminal stops parent check)
└── private/
├── syft.pub.yaml # Override: read = ["[email protected] "], terminal = true
└── sensitive.csv # ✅ Only Alice can read
Terminal Flag
The terminal flag controls whether to check parent folders:
Continue checking parent folder permissions. # public/syft.pub.yaml
rules :
- pattern : "**/*"
access :
read : [ "*" ]
terminal : false # Also check ../syft.pub.yaml
Effective permissions are the union of this folder and parents. Stop checking parent folders. Only use rules in this file. # private/syft.pub.yaml
rules :
- pattern : "**/*"
access :
read : [ "[email protected] " ]
terminal : true # Don't check parent folders
Effective permissions are only from this file.
Permission Engine
The permission engine evaluates access at runtime:
Using Permission Context
Granting Access
from syft_perm import SyftPermContext
from pathlib import Path
# Initialize context for a datasite
datasite = Path( "/syftbox/[email protected] " )
ctx = SyftPermContext( datasite = datasite)
# Check permissions on a file
file_perm = ctx.open( "public/data.csv" )
# Check if user has access
if file_perm.has_read_access( "[email protected] " ):
print ( "Bob can read this file" )
if file_perm.has_write_access( "[email protected] " ):
print ( "Bob can write to this file" )
Permission Checking in Sync
The DatasiteOwnerSyncer checks permissions on all incoming changes:
syft_client/sync/sync/datasite_owner_syncer.py (line 637)
def handle_proposed_filechange_events_message (
self , sender_email : str , proposed_events_message : ProposedFileChangesMessage
):
"""Process incoming file changes from a peer"""
# Filter to only changes sender has permission to make
allowed_changes = [
change
for change in proposed_events_message.proposed_file_changes
if self .check_write_permission(sender_email, str (change.path_in_datasite))
]
if not allowed_changes:
return # Reject all changes silently
# Process only allowed changes
filtered_message = ProposedFileChangesMessage(
sender_email = proposed_events_message.sender_email,
proposed_file_changes = allowed_changes,
)
accepted_events_message = self .event_cache.process_proposed_events_message(
filtered_message
)
Unauthorized changes are silently rejected . No error is returned to the sender. This prevents information leakage about file structure.
Read Permission Enforcement
When sharing file changes with peers, only readers get the updates:
syft_client/sync/sync/datasite_owner_syncer.py (line 552)
def _route_data_events (
self ,
data_events : list[FileChangeEvent],
recipients : list[ str ],
events_by_recipient : dict[ str , list[FileChangeEvent]],
data_event_sent_to : dict[ str , set[ str ]],
):
"""Route data events to recipients who have read access."""
for event in data_events:
path_str = str (event.path_in_datasite)
# Get all recipients with read permission
readers = self ._get_readers(path_str, recipients)
# Send event only to authorized readers
for reader in readers:
events_by_recipient[reader].append(event)
def _get_readers ( self , path : str , recipients : list[ str ]) -> frozenset[ str ]:
"""Return recipients that have read access to the given path."""
return frozenset (r for r in recipients if self .check_read_permissions(r, path))
Job Permissions
Jobs are submitted to the data owner’s job queue. Write permission to the job queue is required:
Setting Up Job Folder for Data Scientist
Data owners can grant job submission access:
Now Bob can submit jobs:
Job Policies
Following Principle 2 , job policies are the second permission system (after file permissions):
from syft_client.job_auto_approval import (
create_approval_policy,
job_matches_criteria
)
from syft_job import get_client
client = get_client( "/path/to/syftbox" , "[email protected] " )
# Define auto-approval criteria
EXPECTED_SCRIPT = """
#!/bin/bash
set -e
python count.py
"""
policy = create_approval_policy(
required_scripts = { "run.sh" : EXPECTED_SCRIPT },
required_filenames = [ "count.py" ],
allowed_users = [ "[email protected] " , "[email protected] " ],
peers_only = True , # Only approved peers
auto_approve = True
)
# Check jobs against policy
for job in client.jobs:
if job.status == "inbox" and job_matches_criteria(
job,
required_scripts = policy[ "required_scripts" ],
required_filenames = policy[ "required_filenames" ],
allowed_users = policy[ "allowed_users" ],
peers_only = policy[ "peers_only" ],
):
job.approve() # Auto-approve matching jobs
Job policies can check:
Exact script match : Script content must match exactly
Required files : Job must contain specific files
Allowed users : Only jobs from specific users
Peers only : Only jobs from approved peer list
See syft_client/job_auto_approval.py for implementation.
Job policies cannot override file permissions. Even with a policy, the job must still have write permission to the job queue.
Permission Syncing
Permission files are synced between peers like any other file:
Data Owner Updates Permissions
Alice modifies public/syft.pub.yaml to grant Bob access:
Permission File Synced
The permission file itself is synced to Bob as a FileChangeEvent.
Data Owner Re-evaluates Access
When permission files change, affected data files are re-sent to newly authorized users: # From datasite_owner_syncer.py:569
def _route_perm_events (
self , perm_events , recipients , events_by_recipient , data_event_sent_to
):
for event in perm_events:
# Find files affected by this permission change
affected_paths = self ._get_paths_under_perm_file( str (event.path_in_datasite))
for affected_path in affected_paths:
new_readers = self ._get_readers(affected_path, recipients)
old_readers = self ._read_perm_cache.get(affected_path, frozenset ())
# Find users who gained access
newly_permitted = new_readers - old_readers
if newly_permitted:
# Resend file to newly authorized users
resend_event = self ._create_resend_event(affected_path)
for reader in newly_permitted:
events_by_recipient[reader].append(resend_event)
Common Permission Patterns
# public/syft.pub.yaml
rules :
- pattern : "**/*"
access :
read : [ "*" ]
write : [ "[email protected] " ]
terminal : false
Private Folder (Owner Only)
Shared with Specific Users
# org_data/syft.pub.yaml
rules :
- pattern : "**/*"
access :
read : [ "*@company.com" ] # All company users
write : [ "*@company.com" ]
terminal : false
Security Considerations
Defense in Depth Even if transport layer is compromised, file permissions are still enforced on the data owner’s side.
Least Privilege Grant minimum necessary permissions. Use specific users instead of "*" when possible.
Terminal Folders Use terminal: true for sensitive folders to prevent inheritance from less-restrictive parents.
Audit Trail All permission changes are logged as FileChangeEvents, creating an audit trail.
Permission files are synced to peers. Anyone with read access to a folder can see its syft.pub.yaml. Don’t put secrets in permission files.
Next Steps
Architecture Understand the overall system design
Datasites Learn about data owner and data scientist roles
Job Policies Set up automatic job approval policies
Permission API Explore the syft-perm API