S3 class provides a high-level interface for interacting with AWS S3 storage in Metaflow. It handles downloads, uploads, and listings of S3 objects with automatic retry logic and parallel operations.
Overview
The S3 client manages connections to S3 and temporary directories for downloaded objects. It supports three initialization modes:- Run-based: Use
S3(run=self)to automatically prefix paths with the current run ID - Explicit prefix: Use
S3(s3root='s3://mybucket/path')to set a custom S3 prefix - Full URLs: Use
S3()with complete S3 URLs for each operation
Usage
The recommended way to use the S3 client is as a context manager:.close() explicitly:
Constructor
S3
Parameters
Directory for storing temporary files during downloads.
Override the bucket from
DATATOOLS_S3ROOT when run is specified.Override the path from
DATATOOLS_S3ROOT when run is specified.Derive path prefix from the current or a past run ID. Use
S3(run=self) inside a flow.S3 prefix to use if
run is not specified. Must start with s3://.Server-side encryption to use when uploading objects to S3.
Methods
get
Download a single object from S3.key: Object to download (S3 URL, path suffix, or S3GetObject for range downloads)return_missing: If True, return S3Object with.exists == Falseinstead of raising exceptionreturn_info: If True, fetch content-type and user metadata
S3Object with downloaded content
get_many
Download many objects from S3 in parallel.keys: Objects to download (S3 URLs, path suffixes, or S3GetObject instances)return_missing: If True, include missing objects with.exists == Falsereturn_info: If True, fetch metadata for each object
S3Object instances
get_recursive
Download all objects under given prefixes recursively in parallel.keys: Prefixes to download recursivelyreturn_info: If True, fetch metadata for each object
S3Object instances for all objects under the prefixes
get_all
Download all objects under the prefix set in the constructor.run or s3root.
put
Upload a single object to S3.key: Object path (S3 URL or path suffix)obj: String, bytes, or file-like object to uploadoverwrite: If False, skip upload if key already existscontent_type: MIME type for the objectmetadata: JSON-encodable dictionary of metadata
put_many
Upload many objects to S3 in parallel.key_objs: List of(key, obj)tuples orS3PutObjectinstancesoverwrite: If False, skip uploads for existing keys
(key, url) pairs for uploaded objects
put_files
Upload many local files to S3 in parallel.key_paths: List of(key, local_path)tuples orS3PutObjectinstancesoverwrite: If False, skip uploads for existing keys
(key, url) pairs for uploaded files
info
Get metadata about a single object without downloading it.info_many
Get metadata about many objects in parallel without downloading them.list_paths
List the next level of paths in S3 (non-recursive)..exists == False.
list_recursive
List all objects recursively under given prefixes..exists == True).
close
Delete all temporary files downloaded in this context.S3Object
AnS3Object represents a path or object in S3. It is returned by S3 client methods and provides access to both the S3 location and downloaded content.
Properties
exists (bool): True if the key corresponds to an existing S3 object
downloaded (bool): True if the object has been downloaded
url (str): S3 location of the object
key (str): Key used in the request that produced this object
path (Optional[str]): Local path to downloaded file (None if not downloaded)
blob (Optional[bytes]): Contents as bytes (None if not downloaded)
text (Optional[str]): Contents as UTF-8 string (None if not downloaded)
size (Optional[int]): Size in bytes (None if object doesn’t exist)
content_type (Optional[str]): MIME type of the object
metadata (Optional[Dict]): User-defined metadata dictionary
encryption (Optional[str]): Server-side encryption type
range_info (Optional[RangeInfo]): Information about partial downloads
last_modified (Optional[int]): Unix timestamp of last modification
Helper Classes
S3GetObject
Specifies a range download request:key: S3 pathoffset: Starting byte offsetlength: Number of bytes to download (negative for “from offset to end”)
S3PutObject
Specifies an upload with metadata:Examples
Download files in a flow
Download multiple files
Partial downloads
Upload with metadata
Related
- IncludeFile - Include local files as flow parameters
- Datastore - Low-level artifact storage
