Filesystem Abstraction
Arrow provides an abstract filesystem interface that works with local files, HDFS, S3, and other storage systems.
FileSystem
Abstract filesystem API.
Properties
Returns the filesystem type name (e.g., “local”, “s3”, “hdfs”)
Returns the IOContext associated with this filesystem
other
const FileSystem&
required
Filesystem to compare with
Returns whether two filesystems are equal
Path Operations
Normalizes path for the given filesystem. Default implementation is a no-op, but subclasses may normalize irregular path forms
uri_string
const std::string&
required
URI or absolute path
Ensures a URI is compatible with the filesystem and returns the path component. Checks URI scheme validity
Makes a URI from which FileSystemFromUri produces an equivalent filesystem
path
const std::string&
required
Path to query
Gets info for the given target. Symlinks are automatically dereferenced. Returns FileType::NotFound for nonexistent files without error
paths
const std::vector<std::string>&
required
Paths to query
Gets info for many targets at once
select
const FileSelector&
required
File selector
Gets info according to a selector. The selector’s base directory is not included in results
paths
const std::vector<std::string>&
required
Paths to query
Async version of GetFileInfo
select
const FileSelector&
required
File selector
Streaming async version of GetFileInfo. The generator is not async-reentrant
Directory Operations
path
const std::string&
required
Directory path to create
Whether to create parent directories
Creates a directory and subdirectories. Succeeds if the directory already exists
path
const std::string&
required
Directory path to delete
Deletes a directory and its contents, recursively
path
const std::string&
required
Directory path
Whether to allow missing directory
Deletes a directory’s contents recursively, but not the directory itself. Empty path is disallowed
path
const std::string&
required
Directory path
Whether to allow missing directory
Async version of DeleteDirContents
Deletes the root directory’s contents recursively. Implementations may raise an error if too dangerous
File Operations
path
const std::string&
required
File path to delete
Deletes a file
paths
const std::vector<std::string>&
required
File paths to delete
Deletes many files. Default implementation issues individual operations in sequence
src
const std::string&
required
Source path
dest
const std::string&
required
Destination path
Moves or renames a file or directory. If destination is a non-empty directory, returns error
src
const std::string&
required
Source path
dest
const std::string&
required
Destination path
Copies a file. If destination exists and is a directory, returns error. Otherwise replaces it
Stream Operations
OpenInputStream
Result<std::shared_ptr<io::InputStream>>
path
const std::string&
required
File path to open
Opens an input stream for sequential reading
OpenInputFile
Result<std::shared_ptr<io::RandomAccessFile>>
path
const std::string&
required
File path to open
Opens an input file for random access reading
OpenInputStreamAsync
Future<std::shared_ptr<io::InputStream>>
path
const std::string&
required
File path to open
Async version of OpenInputStream
OpenInputFileAsync
Future<std::shared_ptr<io::RandomAccessFile>>
path
const std::string&
required
File path to open
Async version of OpenInputFile
OpenOutputStream
Result<std::shared_ptr<io::OutputStream>>
path
const std::string&
required
File path to open
metadata
const std::shared_ptr<const KeyValueMetadata>&
required
Optional metadata to attach
Opens an output stream for sequential writing. If target exists, existing data is truncated
OpenAppendStream
Result<std::shared_ptr<io::OutputStream>>
path
const std::string&
required
File path to open
metadata
const std::shared_ptr<const KeyValueMetadata>&
required
Optional metadata to attach
Opens an output stream for appending. Creates new empty file if it doesn’t exist. May return NotImplemented on some filesystems
FileInfo
FileSystem entry information.
File type (default: Unknown)
Constructs a FileInfo
Returns the file type (File, Directory, or Unknown)
Returns the full file path in the filesystem
Returns the file base name (component after the last directory separator)
Returns the directory base name (component before the file base name)
Returns the size in bytes, if available. Only regular files are guaranteed to have a size
Returns the time of last modification, if available
Returns the file extension (excluding the dot)
Returns true if type is File
Returns true if type is Directory
FileSelector
File selector for filesystem APIs.
struct FileSelector {
std::string base_dir;
bool allow_not_found;
bool recursive;
int32_t max_recursion;
}
Directory in which to select files. Error if path exists but isn’t a directory
Behavior if base_dir isn’t found. If false, returns error. If true, returns empty selection
Whether to recurse into subdirectories
Maximum number of subdirectories to recurse into (default: INT32_MAX)
Factory Functions
FileSystemFromUri
Creates a new FileSystem by URI.
Result<std::shared_ptr<FileSystem>> FileSystemFromUri(
const std::string& uri,
std::string* out_path = nullptr)
Result<std::shared_ptr<FileSystem>> FileSystemFromUri(
const std::string& uri,
const io::IOContext& io_context,
std::string* out_path = nullptr)
uri
const std::string&
required
URI-based path (e.g., “file:///some/local/path”, “s3://bucket/key”)
IOContext for the filesystem
Optional output parameter for path inside the filesystem
Recognized schemes: “file”, “mock”, “hdfs”, “viewfs”, “s3”, “gs”, “gcs”. Custom schemes can be registered with RegisterFileSystemFactory.
FileSystemFromUriOrPath
Creates a new FileSystem by URI or local path.
Result<std::shared_ptr<FileSystem>> FileSystemFromUriOrPath(
const std::string& uri,
std::string* out_path = nullptr)
Result<std::shared_ptr<FileSystem>> FileSystemFromUriOrPath(
const std::string& uri,
const io::IOContext& io_context,
std::string* out_path = nullptr)
Same as FileSystemFromUri, but also recognizes non-URIs and treats them as local filesystem paths. Only absolute paths are allowed.
Specialized Filesystems
SubTreeFileSystem
A FileSystem that delegates to another implementation after prepending a fixed base path.
class SubTreeFileSystem : public FileSystem
base_path
const std::string&
required
Base path to prepend
base_fs
std::shared_ptr<FileSystem>
required
Underlying filesystem
Constructs a SubTreeFileSystem. Exposes a logical view of a subtree
base_fs
std::shared_ptr<FileSystem>
Returns the underlying filesystem
SlowFileSystem
A FileSystem that delegates to another implementation but inserts latencies.
class SlowFileSystem : public FileSystem
base_fs
std::shared_ptr<FileSystem>
required
Underlying filesystem
Average latency in seconds
Random seed for latency generation
Constructs a SlowFileSystem for testing
Utility Functions
CopyFiles
Copies files, including between different filesystems.
Status CopyFiles(
const std::vector<FileLocator>& sources,
const std::vector<FileLocator>& destinations,
const io::IOContext& io_context = io::default_io_context(),
int64_t chunk_size = 1024 * 1024,
bool use_threads = true)
Status CopyFiles(
const std::shared_ptr<FileSystem>& source_fs,
const FileSelector& source_sel,
const std::shared_ptr<FileSystem>& destination_fs,
const std::string& destination_base_dir,
const io::IOContext& io_context = io::default_io_context(),
int64_t chunk_size = 1024 * 1024,
bool use_threads = true)
If source and destination are in the same filesystem, uses FileSystem::CopyFile. Otherwise opens streams and copies chunks. The second overload creates directories as needed.