Overview
Multi-client setup enables you to deploy federated learning across multiple datasites (data owners) with a central aggregator (data scientist). This guide covers configuration, deployment, and management of distributed FL systems.Architecture
Components
Aggregator
Data scientist running the FL server
Datasites
Data owners with private datasets
Transport
Communication layer (SyftBox or P2P)
Communication Flow
Setup Process
Step 1: Bootstrap Project
Configure your FL project with aggregator and datasites:pyproject.toml with:
Step 2: Distribute Project
Each participant needs the project code:Step 3: Install Dependencies
Each participant installs the project:Running the Aggregator
Start the Server
On the aggregator machine:Server Implementation
Server Logs
Running Clients
Start Clients
On each datasite:Client Implementation
Client Logs
Data Management
Client Data Structure
Each client should organize their data:Loading Client Data
Setting Data Path
- Environment Variable
- Systemd Service
- Docker
Encryption and Security
Key Bootstrap
For SyftBox transport, encryption keys are automatically generated:Verifying Encryption
Check that encryption is enabled:DID Document Access
Participants must be able to read each other’s DID documents:Monitoring and Management
Checking Connected Clients
Health Checks
Graceful Shutdown
Stop all clients from the server:Scaling to Many Clients
Sampling Strategies
For large-scale deployments, sample a subset of clients per round:Configuration
Example: Three-Hospital Deployment
Project Configuration
Deployment Steps
Troubleshooting
Clients Not Connecting
Symptom: Server shows “Waiting for nodes to connect…” Solutions:-
Verify clients are running:
-
Check client logs for errors:
-
Verify SyftBox is syncing:
Messages Not Encrypting
Symptom: Logs show “PLAINTEXT” instead of “ENCRYPTED” Solutions:-
Check encryption setting:
-
Verify DID documents exist:
-
Re-bootstrap encryption:
Client Can’t Load Data
Symptom: “FileNotFoundError: Path .data/ does not exist” Solutions:-
Set DATA_DIR:
-
Verify data exists:
Performance Issues
Symptom: Slow communication or timeouts Solutions:- Reduce message size (use model compression)
- Increase timeouts in config
- Sample fewer clients per round
- Check network connectivity between participants
Production Deployment
Systemd Service Template
Next Steps
Run Simulations
Test multi-client setup locally
Offline Training
Handle intermittent client availability
Transport Configuration
Optimize communication layer