Skip to main content

Filesystem Abstraction

Arrow provides an abstract filesystem interface that works with local files, HDFS, S3, and other storage systems.

FileSystem

Abstract filesystem API.
class FileSystem

Properties

type_name
std::string
Returns the filesystem type name (e.g., “local”, “s3”, “hdfs”)
io_context
const io::IOContext&
Returns the IOContext associated with this filesystem
Equals
bool
other
const FileSystem&
required
Filesystem to compare with
Returns whether two filesystems are equal

Path Operations

NormalizePath
Result<std::string>
path
std::string
required
Path to normalize
Normalizes path for the given filesystem. Default implementation is a no-op, but subclasses may normalize irregular path forms
PathFromUri
Result<std::string>
uri_string
const std::string&
required
URI or absolute path
Ensures a URI is compatible with the filesystem and returns the path component. Checks URI scheme validity
MakeUri
Result<std::string>
path
std::string
required
Absolute path
Makes a URI from which FileSystemFromUri produces an equivalent filesystem

File Information

GetFileInfo
Result<FileInfo>
path
const std::string&
required
Path to query
Gets info for the given target. Symlinks are automatically dereferenced. Returns FileType::NotFound for nonexistent files without error
GetFileInfo
Result<FileInfoVector>
paths
const std::vector<std::string>&
required
Paths to query
Gets info for many targets at once
GetFileInfo
Result<FileInfoVector>
select
const FileSelector&
required
File selector
Gets info according to a selector. The selector’s base directory is not included in results
GetFileInfoAsync
Future<FileInfoVector>
paths
const std::vector<std::string>&
required
Paths to query
Async version of GetFileInfo
GetFileInfoGenerator
FileInfoGenerator
select
const FileSelector&
required
File selector
Streaming async version of GetFileInfo. The generator is not async-reentrant

Directory Operations

CreateDir
Status
path
const std::string&
required
Directory path to create
recursive
bool
required
Whether to create parent directories
Creates a directory and subdirectories. Succeeds if the directory already exists
DeleteDir
Status
path
const std::string&
required
Directory path to delete
Deletes a directory and its contents, recursively
DeleteDirContents
Status
path
const std::string&
required
Directory path
missing_dir_ok
bool
required
Whether to allow missing directory
Deletes a directory’s contents recursively, but not the directory itself. Empty path is disallowed
DeleteDirContentsAsync
Future<>
path
const std::string&
required
Directory path
missing_dir_ok
bool
required
Whether to allow missing directory
Async version of DeleteDirContents
DeleteRootDirContents
Status
Deletes the root directory’s contents recursively. Implementations may raise an error if too dangerous

File Operations

DeleteFile
Status
path
const std::string&
required
File path to delete
Deletes a file
DeleteFiles
Status
paths
const std::vector<std::string>&
required
File paths to delete
Deletes many files. Default implementation issues individual operations in sequence
Move
Status
src
const std::string&
required
Source path
dest
const std::string&
required
Destination path
Moves or renames a file or directory. If destination is a non-empty directory, returns error
CopyFile
Status
src
const std::string&
required
Source path
dest
const std::string&
required
Destination path
Copies a file. If destination exists and is a directory, returns error. Otherwise replaces it

Stream Operations

OpenInputStream
Result<std::shared_ptr<io::InputStream>>
path
const std::string&
required
File path to open
Opens an input stream for sequential reading
OpenInputFile
Result<std::shared_ptr<io::RandomAccessFile>>
path
const std::string&
required
File path to open
Opens an input file for random access reading
OpenInputStreamAsync
Future<std::shared_ptr<io::InputStream>>
path
const std::string&
required
File path to open
Async version of OpenInputStream
OpenInputFileAsync
Future<std::shared_ptr<io::RandomAccessFile>>
path
const std::string&
required
File path to open
Async version of OpenInputFile
OpenOutputStream
Result<std::shared_ptr<io::OutputStream>>
path
const std::string&
required
File path to open
metadata
const std::shared_ptr<const KeyValueMetadata>&
required
Optional metadata to attach
Opens an output stream for sequential writing. If target exists, existing data is truncated
OpenAppendStream
Result<std::shared_ptr<io::OutputStream>>
path
const std::string&
required
File path to open
metadata
const std::shared_ptr<const KeyValueMetadata>&
required
Optional metadata to attach
Opens an output stream for appending. Creates new empty file if it doesn’t exist. May return NotImplemented on some filesystems

FileInfo

FileSystem entry information.
struct FileInfo
FileInfo
path
std::string
required
Full file path
type
FileType
File type (default: Unknown)
Constructs a FileInfo
type
FileType
Returns the file type (File, Directory, or Unknown)
path
const std::string&
Returns the full file path in the filesystem
base_name
std::string
Returns the file base name (component after the last directory separator)
dir_name
std::string
Returns the directory base name (component before the file base name)
size
int64_t
Returns the size in bytes, if available. Only regular files are guaranteed to have a size
mtime
TimePoint
Returns the time of last modification, if available
extension
std::string
Returns the file extension (excluding the dot)
IsFile
bool
Returns true if type is File
IsDirectory
bool
Returns true if type is Directory

FileSelector

File selector for filesystem APIs.
struct FileSelector {
  std::string base_dir;
  bool allow_not_found;
  bool recursive;
  int32_t max_recursion;
}
base_dir
std::string
required
Directory in which to select files. Error if path exists but isn’t a directory
allow_not_found
bool
required
Behavior if base_dir isn’t found. If false, returns error. If true, returns empty selection
recursive
bool
required
Whether to recurse into subdirectories
max_recursion
int32_t
required
Maximum number of subdirectories to recurse into (default: INT32_MAX)

Factory Functions

FileSystemFromUri

Creates a new FileSystem by URI.
Result<std::shared_ptr<FileSystem>> FileSystemFromUri(
    const std::string& uri,
    std::string* out_path = nullptr)

Result<std::shared_ptr<FileSystem>> FileSystemFromUri(
    const std::string& uri,
    const io::IOContext& io_context,
    std::string* out_path = nullptr)
uri
const std::string&
required
URI-based path (e.g., “file:///some/local/path”, “s3://bucket/key”)
io_context
const io::IOContext&
IOContext for the filesystem
out_path
std::string*
Optional output parameter for path inside the filesystem
Recognized schemes: “file”, “mock”, “hdfs”, “viewfs”, “s3”, “gs”, “gcs”. Custom schemes can be registered with RegisterFileSystemFactory.

FileSystemFromUriOrPath

Creates a new FileSystem by URI or local path.
Result<std::shared_ptr<FileSystem>> FileSystemFromUriOrPath(
    const std::string& uri,
    std::string* out_path = nullptr)

Result<std::shared_ptr<FileSystem>> FileSystemFromUriOrPath(
    const std::string& uri,
    const io::IOContext& io_context,
    std::string* out_path = nullptr)
Same as FileSystemFromUri, but also recognizes non-URIs and treats them as local filesystem paths. Only absolute paths are allowed.

Specialized Filesystems

SubTreeFileSystem

A FileSystem that delegates to another implementation after prepending a fixed base path.
class SubTreeFileSystem : public FileSystem
SubTreeFileSystem
base_path
const std::string&
required
Base path to prepend
base_fs
std::shared_ptr<FileSystem>
required
Underlying filesystem
Constructs a SubTreeFileSystem. Exposes a logical view of a subtree
base_path
std::string
Returns the base path
base_fs
std::shared_ptr<FileSystem>
Returns the underlying filesystem

SlowFileSystem

A FileSystem that delegates to another implementation but inserts latencies.
class SlowFileSystem : public FileSystem
SlowFileSystem
base_fs
std::shared_ptr<FileSystem>
required
Underlying filesystem
average_latency
double
required
Average latency in seconds
seed
int32_t
Random seed for latency generation
Constructs a SlowFileSystem for testing

Utility Functions

CopyFiles

Copies files, including between different filesystems.
Status CopyFiles(
    const std::vector<FileLocator>& sources,
    const std::vector<FileLocator>& destinations,
    const io::IOContext& io_context = io::default_io_context(),
    int64_t chunk_size = 1024 * 1024,
    bool use_threads = true)

Status CopyFiles(
    const std::shared_ptr<FileSystem>& source_fs,
    const FileSelector& source_sel,
    const std::shared_ptr<FileSystem>& destination_fs,
    const std::string& destination_base_dir,
    const io::IOContext& io_context = io::default_io_context(),
    int64_t chunk_size = 1024 * 1024,
    bool use_threads = true)
If source and destination are in the same filesystem, uses FileSystem::CopyFile. Otherwise opens streams and copies chunks. The second overload creates directories as needed.

Build docs developers (and LLMs) love