Datasource Types
Zipline supports two datasource types:- local - Store files on the local filesystem
- s3 - Store files in S3-compatible object storage (AWS S3, MinIO, DigitalOcean Spaces, etc.)
The storage backend to use.
Local Datasource
The local datasource stores uploaded files directly on the server’s filesystem. This is the simplest option and works well for single-server deployments.Configuration
Absolute or relative path to the directory where files will be stored.
Docker Configuration
When using Docker, you must mount the uploads directory as a volume to persist files:The path specified in
DATASOURCE_LOCAL_DIRECTORY should match the mount point inside the container.Permissions
Ensure the Zipline process has read and write permissions to the uploads directory:Path Security
Zipline validates all file paths to prevent directory traversal attacks. Attempted access to files outside the configured directory will be rejected.S3 Datasource
The S3 datasource stores files in S3-compatible object storage. This is ideal for:- Distributed deployments
- Cloud hosting
- Large-scale file storage
- CDN integration
Required Configuration
AWS access key ID or equivalent for S3-compatible services.
AWS secret access key or equivalent for S3-compatible services.
AWS region or equivalent for S3-compatible services.
S3 bucket name where files will be stored.
Optional Configuration
Custom endpoint URL for S3-compatible services (not needed for AWS S3).
Use path-style URLs (
https://endpoint/bucket/key) instead of virtual-hosted-style (https://bucket.endpoint/key).Required for MinIO and some other S3-compatible services.Store all files within a subdirectory (prefix) in the bucket.
S3 Provider Examples
AWS S3
AWS S3
MinIO
MinIO
MinIO requires
DATASOURCE_S3_FORCE_PATH_STYLE=true and a custom endpoint.DigitalOcean Spaces
DigitalOcean Spaces
Backblaze B2
Backblaze B2
Cloudflare R2
Cloudflare R2
Connection Settings
Zipline uses the following connection settings for S3:- Connection timeout: 10 seconds
- Socket timeout: 120 seconds (2 minutes)
- Max sockets: 1000
- Keep-alive: Enabled
Access Verification
When Zipline starts with S3 datasource, it performs an access test:- Creates a temporary test file in the bucket
- Reads the test file back
- Deletes the test file
Large File Handling
For files larger than 5GB, Zipline automatically uses multipart operations:- Multipart uploads: Split large files into 25MB chunks (configurable via
CHUNKS_SIZE) - Multipart copy: For rename operations on files >5GB
- Part size: 5MB for multipart operations
Subdirectory Usage
Using a subdirectory is helpful when:- Sharing a bucket with other applications
- Organizing files by environment (production/staging)
- Implementing bucket-level lifecycle policies
Switching Datasources
To switch from local to S3:- Configure S3 environment variables
- Upload existing files from local directory to S3 bucket
- Update
DATASOURCE_TYPE=s3 - Restart Zipline
- Download all files from S3 bucket to local directory
- Update
DATASOURCE_TYPE=localandDATASOURCE_LOCAL_DIRECTORY - Restart Zipline
Performance Considerations
Local Datasource
Pros:- Faster for small files (no network overhead)
- Simpler configuration
- Lower operating costs
- Limited by disk space
- Not suitable for distributed deployments
- Requires volume mounts in containers
S3 Datasource
Pros:- Virtually unlimited storage
- High availability and durability
- Works with distributed deployments
- Can integrate with CDNs
- Automatic backups (if configured)
- Network latency
- Storage and bandwidth costs
- Requires internet connectivity
- More complex configuration
Troubleshooting
Local Datasource Issues
Error: “Invalid path provided”- Check directory permissions
- Verify
DATASOURCE_LOCAL_DIRECTORYis correct - Ensure directory exists
- Fix directory permissions:
chmod 755 /path/to/uploads - Ensure Zipline process user owns directory
S3 Datasource Issues
Error: “Access Denied”- Verify IAM permissions include all required actions
- Check bucket policy doesn’t deny access
- Ensure credentials are correct
- Credentials are incorrect or expired
- For MinIO/custom endpoints, verify access key format
- Bucket doesn’t exist in specified region
- Bucket name is incorrect
- Region is incorrect
- Check network connectivity
- Verify endpoint URL is correct
- Check firewall rules
Next Steps
Environment Variables
Complete environment variable reference