Skip to main content
Syft uses a file-first permission system where access control is defined by syft.pub.yaml files placed throughout the datasite folder hierarchy.

Core Permission Principles

File-Permission-First

Principle 2: Access control is first and foremost described by file permissions. Job policies are secondary.

File First

Principle 1: State is files. Permissions are also files (syft.pub.yaml) synced between peers.

Single Gateway

Principle 17: Only one job queue per datasite. Data owners see everything entering/leaving.

Manual Review First

Principle 7: All jobs are manually reviewed unless a policy auto-approves them.
Following Principle 2, there are only two permission systems: file permissions and job policies. Nothing else. No hidden access control.

Permission Files

Permissions are defined in syft.pub.yaml files:
# File: [email protected]/public/syft.pub.yaml
rules:
  - pattern: "**/*"              # Match all files in this folder
    access:
      read: ["*"]                # Everyone can read
      write: ["[email protected]"]  # Only owner can write

terminal: false                  # Continue checking parent folders

Permission File Schema

class RuleSet(BaseModel):
    """A collection of permission rules in a folder"""
    
    rules: list[Rule] = []       # List of permission rules
    terminal: bool = False       # Stop checking parent folders?
    path: str = ""               # Path to this ruleset (runtime)
    
    @classmethod
    def load(cls, filepath: Path) -> "RuleSet":
        """Load from syft.pub.yaml file"""
        with open(filepath) as f:
            data = yaml.safe_load(f) or {}
        rs = cls.model_validate(data)
        rs.path = str(filepath.parent)
        return rs

class Rule(BaseModel):
    """A single permission rule"""
    
    pattern: str                 # Glob pattern (e.g., "**/*.csv")
    access: Access               # Access levels

class Access(BaseModel):
    """Access control lists"""
    
    admin: list[str] = []        # Full control
    write: list[str] = []        # Can create/modify files
    read: list[str] = []         # Can read files

Pattern Matching

Patterns use glob syntax:
rules:
  - pattern: "**/*"
    access:
      read: ["*"]
Matches all files in folder and subfolders.
rules:
  - pattern: "*.csv"
    access:
      read: ["[email protected]"]
Matches only .csv files in the current folder (not subfolders).
rules:
  - pattern: "data/**"
    access:
      read: ["*@research-team.com"]
Matches all files under data/ subfolder.
rules:
  - pattern: "results_*.json"
    access:
      read: ["[email protected]", "[email protected]"]
Matches files like results_2024.json, results_final.json, etc.

Access Levels

Can view file contents. Required to:
  • Download files from peer’s datasite
  • List file existence
  • See file in job outputs
access:
  read: ["[email protected]", "*@company.com"]

User Patterns

Access lists support several patterns:
PatternMeaningExample
"*"Everyone (any user)read: ["*"]
"[email protected]"Specific userread: ["[email protected]"]
"*@domain.com"Domain wildcardread: ["*@company.com"]
"USER"Placeholder (owner)admin: ["USER"]
# Everyone can read
access:
  read: ["*"]
  write: ["[email protected]"]

Permission Hierarchy

Permissions are inherited through the folder hierarchy:
[email protected]/
├── syft.pub.yaml           # Root: admin = ["[email protected]"]
├── public/
│   ├── syft.pub.yaml       # Override: read = ["*"], terminal = false
│   ├── data.csv            # ✅ Everyone can read (from public/syft.pub.yaml)
│   └── restricted/
│       ├── syft.pub.yaml   # Override: read = ["[email protected]"], terminal = true
│       └── secret.csv      # ✅ Only Bob can read (terminal stops parent check)
└── private/
    ├── syft.pub.yaml       # Override: read = ["[email protected]"], terminal = true
    └── sensitive.csv       # ✅ Only Alice can read

Terminal Flag

The terminal flag controls whether to check parent folders:
Continue checking parent folder permissions.
# public/syft.pub.yaml
rules:
  - pattern: "**/*"
    access:
      read: ["*"]
terminal: false  # Also check ../syft.pub.yaml
Effective permissions are the union of this folder and parents.

Permission Engine

The permission engine evaluates access at runtime:
from syft_perm import SyftPermContext
from pathlib import Path

# Initialize context for a datasite
datasite = Path("/syftbox/[email protected]")
ctx = SyftPermContext(datasite=datasite)

# Check permissions on a file
file_perm = ctx.open("public/data.csv")

# Check if user has access
if file_perm.has_read_access("[email protected]"):
    print("Bob can read this file")

if file_perm.has_write_access("[email protected]"):
    print("Bob can write to this file")

Permission Checking in Sync

The DatasiteOwnerSyncer checks permissions on all incoming changes:
def handle_proposed_filechange_events_message(
    self, sender_email: str, proposed_events_message: ProposedFileChangesMessage
):
    """Process incoming file changes from a peer"""
    
    # Filter to only changes sender has permission to make
    allowed_changes = [
        change
        for change in proposed_events_message.proposed_file_changes
        if self.check_write_permission(sender_email, str(change.path_in_datasite))
    ]
    
    if not allowed_changes:
        return  # Reject all changes silently
    
    # Process only allowed changes
    filtered_message = ProposedFileChangesMessage(
        sender_email=proposed_events_message.sender_email,
        proposed_file_changes=allowed_changes,
    )
    
    accepted_events_message = self.event_cache.process_proposed_events_message(
        filtered_message
    )
Unauthorized changes are silently rejected. No error is returned to the sender. This prevents information leakage about file structure.

Read Permission Enforcement

When sharing file changes with peers, only readers get the updates:
def _route_data_events(
    self,
    data_events: list[FileChangeEvent],
    recipients: list[str],
    events_by_recipient: dict[str, list[FileChangeEvent]],
    data_event_sent_to: dict[str, set[str]],
):
    """Route data events to recipients who have read access."""
    for event in data_events:
        path_str = str(event.path_in_datasite)
        
        # Get all recipients with read permission
        readers = self._get_readers(path_str, recipients)
        
        # Send event only to authorized readers
        for reader in readers:
            events_by_recipient[reader].append(event)

def _get_readers(self, path: str, recipients: list[str]) -> frozenset[str]:
    """Return recipients that have read access to the given path."""
    return frozenset(r for r in recipients if self.check_read_permissions(r, path))

Job Permissions

Jobs are submitted to the data owner’s job queue. Write permission to the job queue is required:
# [email protected]/jobs/[email protected]/syft.pub.yaml
rules:
  - pattern: "**/*"
    access:
      write: ["[email protected]"]  # Bob can submit jobs
      read: ["[email protected]", "[email protected]"]  # Both can read
      admin: ["[email protected]"]  # Alice controls her queue

Setting Up Job Folder for Data Scientist

Data owners can grant job submission access:
from syft_job import get_client

client = get_client("/path/to/syftbox", "[email protected]")

# Create job folder for Bob with write permissions
ds_job_dir = client.setup_ds_job_folder_as_do("[email protected]")

# This creates:
# [email protected]/jobs/[email protected]/
# with write permissions for [email protected]
Now Bob can submit jobs:
client = get_client("/path/to/syftbox", "[email protected]")

client.submit_python_job(
    user="[email protected]",
    code_path="analysis.py",
    job_name="My Analysis"
)
# Job written to [email protected]/jobs/[email protected]/My Analysis/

Job Policies

Following Principle 2, job policies are the second permission system (after file permissions):
from syft_client.job_auto_approval import (
    create_approval_policy,
    job_matches_criteria
)
from syft_job import get_client

client = get_client("/path/to/syftbox", "[email protected]")

# Define auto-approval criteria
EXPECTED_SCRIPT = """
#!/bin/bash
set -e
python count.py
"""

policy = create_approval_policy(
    required_scripts={"run.sh": EXPECTED_SCRIPT},
    required_filenames=["count.py"],
    allowed_users=["[email protected]", "[email protected]"],
    peers_only=True,  # Only approved peers
    auto_approve=True
)

# Check jobs against policy
for job in client.jobs:
    if job.status == "inbox" and job_matches_criteria(
        job,
        required_scripts=policy["required_scripts"],
        required_filenames=policy["required_filenames"],
        allowed_users=policy["allowed_users"],
        peers_only=policy["peers_only"],
    ):
        job.approve()  # Auto-approve matching jobs
Job policies can check:
  • Exact script match: Script content must match exactly
  • Required files: Job must contain specific files
  • Allowed users: Only jobs from specific users
  • Peers only: Only jobs from approved peer list
See syft_client/job_auto_approval.py for implementation.
Job policies cannot override file permissions. Even with a policy, the job must still have write permission to the job queue.

Permission Syncing

Permission files are synced between peers like any other file:
1

Data Owner Updates Permissions

Alice modifies public/syft.pub.yaml to grant Bob access:
rules:
  - pattern: "**/*"
    access:
      read: ["*", "[email protected]"]  # Add Bob
2

Permission File Synced

The permission file itself is synced to Bob as a FileChangeEvent.
3

Data Owner Re-evaluates Access

When permission files change, affected data files are re-sent to newly authorized users:
# From datasite_owner_syncer.py:569
def _route_perm_events(
    self, perm_events, recipients, events_by_recipient, data_event_sent_to
):
    for event in perm_events:
        # Find files affected by this permission change
        affected_paths = self._get_paths_under_perm_file(str(event.path_in_datasite))
        
        for affected_path in affected_paths:
            new_readers = self._get_readers(affected_path, recipients)
            old_readers = self._read_perm_cache.get(affected_path, frozenset())
            
            # Find users who gained access
            newly_permitted = new_readers - old_readers
            
            if newly_permitted:
                # Resend file to newly authorized users
                resend_event = self._create_resend_event(affected_path)
                for reader in newly_permitted:
                    events_by_recipient[reader].append(resend_event)

Common Permission Patterns

# public/syft.pub.yaml
rules:
  - pattern: "**/*"
    access:
      read: ["*"]
      write: ["[email protected]"]
terminal: false
# private/syft.pub.yaml
rules:
  - pattern: "**/*"
    access:
      read: ["[email protected]"]
      write: ["[email protected]"]
      admin: ["[email protected]"]
terminal: true  # Don't check parent folders
# shared/syft.pub.yaml
rules:
  - pattern: "**/*"
    access:
      read: ["[email protected]", "[email protected]"]
      write: ["[email protected]"]
terminal: false
# org_data/syft.pub.yaml
rules:
  - pattern: "**/*"
    access:
      read: ["*@company.com"]  # All company users
      write: ["*@company.com"]
terminal: false
# jobs/[email protected]/syft.pub.yaml
rules:
  - pattern: "**/*"
    access:
      read: ["[email protected]", "[email protected]"]
      write: ["[email protected]"]  # Bob can submit jobs
      admin: ["[email protected]"]  # Alice controls her queue

Security Considerations

Defense in Depth

Even if transport layer is compromised, file permissions are still enforced on the data owner’s side.

Least Privilege

Grant minimum necessary permissions. Use specific users instead of "*" when possible.

Terminal Folders

Use terminal: true for sensitive folders to prevent inheritance from less-restrictive parents.

Audit Trail

All permission changes are logged as FileChangeEvents, creating an audit trail.
Permission files are synced to peers. Anyone with read access to a folder can see its syft.pub.yaml. Don’t put secrets in permission files.

Next Steps

Architecture

Understand the overall system design

Datasites

Learn about data owner and data scientist roles

Job Policies

Set up automatic job approval policies

Permission API

Explore the syft-perm API

Build docs developers (and LLMs) love