Skip to main content

I/O Interfaces

Arrow’s I/O interfaces provide abstract interfaces for reading and writing data from various sources.

Core Interfaces

FileInterface

Base interface for all file-like objects.
class FileInterface
Close
Status
Closes the stream cleanly. For writable streams, attempts to flush pending data before releasing resources. After Close(), closed() returns true
CloseAsync
Future<>
Closes the stream asynchronously. By default, submits synchronous Close() to the I/O thread pool
Abort
Status
Closes the stream abruptly without guaranteeing pending data is flushed. Merely releases underlying resources
Tell
Result<int64_t>
Returns the current position in the stream
closed
bool
Returns whether the stream is closed
mode
FileMode::type
Returns the file mode (READ, WRITE, or READWRITE)

Input Streams

InputStream

Interface for sequential reading.
class InputStream : virtual public FileInterface, virtual public Readable
Read
Result<int64_t>
nbytes
int64_t
required
Maximum number of bytes to read
out
void*
required
Buffer to read into
Reads at most nbytes from the current position into out. Returns the number of bytes actually read
Read
Result<std::shared_ptr<Buffer>>
nbytes
int64_t
required
Maximum number of bytes to read
Reads at most nbytes from the current position. May avoid a memory copy in some cases. Returns a Buffer containing the data
Advance
Status
nbytes
int64_t
required
Number of bytes to skip
Advances or skips the stream by the indicated number of bytes
Peek
Result<std::string_view>
nbytes
int64_t
required
Maximum number of bytes to peek
Returns zero-copy string_view to upcoming bytes without modifying stream position. View becomes invalid after any operation on the stream. May return NotImplemented
supports_zero_copy
bool
Returns true if InputStream is capable of zero-copy Buffer reads
ReadMetadata
Result<std::shared_ptr<const KeyValueMetadata>>
Reads and returns stream metadata. If not supported, returns empty metadata or nullptr
ReadMetadataAsync
Future<std::shared_ptr<const KeyValueMetadata>>
io_context
const IOContext&
I/O context for async operations
Reads stream metadata asynchronously

RandomAccessFile

Interface for random access reading.
class RandomAccessFile : public InputStream, public Seekable
GetSize
Result<int64_t>
Returns the total file size in bytes. Does not read or move the current position, so is safe to call concurrently
Seek
Status
position
int64_t
required
Position to seek to
Seeks to the specified position in the file
ReadAt
Result<int64_t>
position
int64_t
required
Position to read from
nbytes
int64_t
required
Maximum number of bytes to read
out
void*
required
Buffer to read into
Reads data from the given position. Thread-safe and does not affect current file position. Returns the number of bytes read
ReadAt
Result<std::shared_ptr<Buffer>>
position
int64_t
required
Position to read from
nbytes
int64_t
required
Maximum number of bytes to read
Reads data from the given position. Returns a Buffer containing the data
ReadAsync
Future<std::shared_ptr<Buffer>>
io_context
const IOContext&
required
I/O context for async operations
position
int64_t
required
Position to read from
nbytes
int64_t
required
Number of bytes to read
Reads data asynchronously from the given position
ReadManyAsync
std::vector<Future<std::shared_ptr<Buffer>>>
io_context
const IOContext&
required
I/O context for async operations
ranges
const std::vector<ReadRange>&
required
Ranges to read
Requests multiple reads at once. The filesystem may optimize by coalescing or parallelizing reads. Returns one future per input range
WillNeed
Status
ranges
const std::vector<ReadRange>&
required
Ranges that will be read soon
Hints that the given ranges may be read soon. Some implementations might prefetch data. No guarantee is made

Output Streams

OutputStream

Interface for sequential writing.
class OutputStream : virtual public FileInterface, public Writable
Write
Status
data
const void*
required
Data to write
nbytes
int64_t
required
Number of bytes to write
Writes the given data to the stream. Always processes bytes in full. Data may be written immediately, buffered, or written asynchronously
Write
Status
data
const std::shared_ptr<Buffer>&
required
Buffer containing data to write
Writes the given data to the stream. Since Buffer owns its memory, can avoid a copy if buffering is required
Flush
Status
Flushes buffered bytes, if any

WritableFile

Interface for writable files with seeking.
class WritableFile : public OutputStream, public Seekable
WriteAt
Status
position
int64_t
required
Position to write at
data
const void*
required
Data to write
nbytes
int64_t
required
Number of bytes to write
Writes data at the specified position

I/O Context

IOContext

Provides context for I/O operations including executor and memory pool.
struct IOContext
IOContext
pool
MemoryPool*
Memory pool for allocations
executor
Executor*
Executor for async operations
stop_token
StopToken
Token for cancellation
external_id
int64_t
Application-specific ID
Constructs an IOContext
pool
MemoryPool*
Returns the memory pool
executor
Executor*
Returns the executor for async operations
external_id
int64_t
Returns the application-specific ID forwarded to executor task submissions
stop_token
StopToken
Returns the cancellation token

ReadRange

Specifies a range of bytes to read.
struct ReadRange {
  int64_t offset;
  int64_t length;
}
offset
int64_t
required
Starting byte offset
length
int64_t
required
Number of bytes to read

Convenience Functions

MakeInputStreamIterator

Creates an iterator over fixed-size blocks from an input stream.
Result<Iterator<std::shared_ptr<Buffer>>> MakeInputStreamIterator(
    std::shared_ptr<InputStream> stream,
    int64_t block_size)
stream
std::shared_ptr<InputStream>
required
Input stream to iterate over
block_size
int64_t
required
Size of each block
The iterator yields fixed-size blocks on each Next() call, except the last block which may be smaller. Returns nullptr when end of stream is reached.

Build docs developers (and LLMs) love