Skip to main content
The DataFrameWriter interface provides methods to save DataFrames to external storage systems, including local files, S3, and catalog tables.

Access

Access the writer through a DataFrame’s write property:
df = session.create_dataframe({"id": [1, 2, 3]})
df.write.csv("output.csv")

Supported Storage Schemes

Amazon S3

Format: s3://{bucket_name}/{path_to_file}
  • Uses boto3 to acquire AWS credentials
df.write.csv("s3://my-bucket/data.csv")
df.write.parquet("s3://my-bucket/data.parquet")

Local Files

Format: file://{absolute_or_relative_path} or implicit
  • Paths without a scheme are treated as local files
df.write.csv("./output.csv")
df.write.parquet("file:///home/user/data.parquet")

Methods

save_as_table()

Saves the content of the DataFrame as a table in the catalog.
table_name
str
required
Name of the table to save to.
mode
'error' | 'append' | 'overwrite' | 'ignore'
default:"'error'"
Write mode:
  • error: Raises an error if table exists
  • append: Appends data to table if it exists
  • overwrite: Overwrites existing table
  • ignore: Silently ignores operation if table exists
Returns: QueryMetrics - The query execution metrics

Examples

# Raises error if table exists
metrics = df.write.save_as_table("my_table")

save_as_view()

Saves the content of the DataFrame as a view in the catalog.
view_name
str
required
Name of the view to save to.
description
str
default:"None"
Optional human-readable view description to store in the catalog.
Returns: None

Example

df.write.save_as_view("my_view", description="Customer summary view")

csv()

Saves the content of the DataFrame as a single CSV file with comma as the delimiter and headers in the first row.
file_path
str | Path
required
Path to save the CSV file to. Must have a .csv extension.
mode
'error' | 'overwrite' | 'ignore'
default:"'overwrite'"
Write mode:
  • error: Raises an error if file exists
  • overwrite: Overwrites the file if it exists
  • ignore: Silently ignores operation if file exists
Returns: QueryMetrics - The query execution metrics

Examples

# Overwrites if exists
metrics = df.write.csv("output.csv")

parquet()

Saves the content of the DataFrame as a single Parquet file.
file_path
str | Path
required
Path to save the Parquet file to. Must have a .parquet extension.
mode
'error' | 'overwrite' | 'ignore'
default:"'overwrite'"
Write mode:
  • error: Raises an error if file exists
  • overwrite: Overwrites the file if it exists
  • ignore: Silently ignores operation if file exists
Returns: QueryMetrics - The query execution metrics

Examples

# Overwrites if exists
metrics = df.write.parquet("output.parquet")

Working with Query Metrics

All write methods (except save_as_view) return a QueryMetrics object that provides information about the execution:
metrics = df.write.csv("output.csv")
print(metrics.get_summary())

See Also

Build docs developers (and LLMs) love