Skip to main content
IncludeFile is a special parameter type that allows you to include local file contents as a parameter for your flow. The file is automatically uploaded to cloud storage and made available as a read-only artifact in all steps.

Overview

Unlike regular parameters that take values from the command line, IncludeFile takes a file path and automatically reads and uploads the file contents. This is useful for:
  • Configuration files (JSON, YAML, etc.)
  • Small datasets
  • Model files
  • Any static file needed by your flow
The file is stored as an artifact, versioned with your run, and available throughout the flow’s execution.

Usage

from metaflow import FlowSpec, IncludeFile, step

class ConfigFlow(FlowSpec):
    config = IncludeFile(
        'config',
        default='config.json',
        help='Configuration file'
    )
    
    @step
    def start(self):
        print(f"Config contents: {self.config}")
        self.next(self.end)
    
    @step
    def end(self):
        # Config is available in all steps
        print(f"Config is still available: {len(self.config)} bytes")
Run the flow:
python flow.py run --config path/to/config.json

Constructor

IncludeFile(
    name: str,
    required: Optional[bool] = None,
    is_text: Optional[bool] = None,
    encoding: Optional[str] = None,
    help: Optional[str] = None,
    parser: Optional[Union[str, Callable[[str], Any]]] = None,
    **kwargs
)

Parameters

name
str
required
User-visible parameter name.
default
str | Callable
Default path to a local file. Can be a string path or a function for deploy-time parameters.
required
bool
default:"False"
If True, user must specify a value. When True, the default is ignored.
is_text
bool
default:"True"
If True, convert file contents to a string using the provided encoding. If False, store as bytes.
encoding
str
default:"utf-8"
Character encoding to use when is_text=True.
help
str
Help text displayed in run --help.
show_default
bool
default:"True"
If True, show the default value in help text.
parser
str | Callable
Function to parse file contents. Can be a callable or a string reference to a function (e.g., "json.loads" or "my_module.parser_func"). Names starting with ”.” are relative to the metaflow package.

File Processing

Text vs Binary

By default, IncludeFile treats files as text and decodes them using UTF-8:
# Text file (default)
config = IncludeFile('config', default='config.txt')
# Access as string
print(self.config)  # "file contents as string"

# Binary file
model = IncludeFile('model', is_text=False, default='model.bin')
# Access as bytes
print(type(self.model))  # <class 'bytes'>

Custom Encoding

Specify a different character encoding:
config = IncludeFile(
    'config',
    encoding='latin-1',
    default='legacy_config.txt'
)

Parsing File Contents

Use the parser parameter to automatically parse file contents:
import json
import yaml

class MyFlow(FlowSpec):
    # Parse JSON automatically
    json_config = IncludeFile(
        'json_config',
        parser=json.loads,
        default='config.json'
    )
    
    # Parse YAML automatically
    yaml_config = IncludeFile(
        'yaml_config',
        parser=yaml.safe_load,
        default='config.yaml'
    )
    
    @step
    def start(self):
        # Already parsed as dict
        print(self.json_config['key'])
        print(self.yaml_config['key'])
        self.next(self.end)
You can also reference a parser function by name:
json_config = IncludeFile(
    'config',
    parser='json.loads',
    default='config.json'
)

Examples

Configuration File

import json
from metaflow import FlowSpec, IncludeFile, step

class ConfigurableFlow(FlowSpec):
    config = IncludeFile(
        'config',
        help='JSON configuration file',
        parser=json.loads,
        default='default_config.json'
    )
    
    @step
    def start(self):
        print(f"Using config: {self.config}")
        self.model_name = self.config['model']
        self.batch_size = self.config['batch_size']
        self.next(self.train)
    
    @step
    def train(self):
        print(f"Training {self.model_name} with batch_size={self.batch_size}")
        self.next(self.end)
    
    @step
    def end(self):
        pass

Multiple File Types

from metaflow import FlowSpec, IncludeFile, step
import pickle

class MultiFileFlow(FlowSpec):
    # Text file with custom encoding
    readme = IncludeFile(
        'readme',
        default='README.txt',
        encoding='utf-8'
    )
    
    # Binary model file
    model = IncludeFile(
        'model',
        is_text=False,
        default='model.pkl'
    )
    
    # CSV data with custom parser
    data = IncludeFile(
        'data',
        parser=lambda content: [line.split(',') for line in content.splitlines()],
        default='data.csv'
    )
    
    @step
    def start(self):
        print(f"README: {self.readme}")
        model_obj = pickle.loads(self.model)
        print(f"Loaded model: {model_obj}")
        print(f"Data rows: {len(self.data)}")
        self.next(self.end)
    
    @step
    def end(self):
        pass

Deploy-Time Parameters

For deployed workflows (like AWS Step Functions), use a callable default:
class ProductionFlow(FlowSpec):
    config = IncludeFile(
        'config',
        default=lambda ctx: f'/etc/configs/{ctx.flow_name}.json',
        parser='json.loads'
    )
    
    @step
    def start(self):
        # Config is loaded from the deploy-time path
        print(self.config)
        self.next(self.end)
    
    @step
    def end(self):
        pass

Under the Hood

When you use IncludeFile:
  1. The file is read from the local filesystem
  2. Contents are compressed with gzip
  3. The file is uploaded to the datastore (S3, Azure, etc.)
  4. A descriptor is stored as the parameter value
  5. In each step, the file is downloaded and decompressed automatically
The file is stored once per flow and shared across all tasks, making it efficient for distributed execution.

Size Considerations

IncludeFile is designed for relatively small files (up to a few hundred MB). For large datasets:
  • Use the S3 client to download data in specific steps
  • Store data externally and download it as needed
  • Consider splitting large files into smaller chunks

Comparison with Regular Parameters

FeatureParameterIncludeFile
InputCommand-line valueFile path
StorageString/numberFile contents
SizeSmall valuesSmall to medium files
AccessDirect valueFile contents as string/bytes
VersioningBy valueBy content hash
  • Parameter - Regular command-line parameters
  • S3 - Direct S3 access for larger files
  • Datastore - Artifact storage system

Build docs developers (and LLMs) love