Skip to main content

Table Classes

Table classes represent tabular data as a collection of chunked columnar arrays.

Table

Logical table as a sequence of chunked arrays.
class Table

Construction

Make
std::shared_ptr<Table>
schema
std::shared_ptr<Schema>
required
The table schema (column types)
columns
std::vector<std::shared_ptr<ChunkedArray>>
required
The table’s columns as chunked arrays
num_rows
int64_t
Number of rows in table, -1 (default) to infer from columns
Constructs a Table from schema and columns. If columns is zero-length, the table’s number of rows is zero
MakeEmpty
Result<std::shared_ptr<Table>>
schema
std::shared_ptr<Schema>
required
The schema of the empty table
pool
MemoryPool*
Memory pool for allocations
Creates an empty Table with a single empty chunk per column
FromRecordBatches
Result<std::shared_ptr<Table>>
schema
std::shared_ptr<Schema>
required
Schema for each batch
batches
const std::vector<std::shared_ptr<RecordBatch>>&
required
Vector of record batches
Constructs a Table from RecordBatches using supplied schema. May have zero record batches
FromChunkedStructArray
Result<std::shared_ptr<Table>>
array
const std::shared_ptr<ChunkedArray>&
required
A chunked StructArray
Constructs a Table from a chunked StructArray. One column will be produced for each field

Properties

schema
std::shared_ptr<Schema>
Returns the table schema
num_rows
int64_t
Returns the number of rows (equal to each column’s logical length)
num_columns
int
Returns the number of columns in the table
column
std::shared_ptr<ChunkedArray>
i
int
required
Column index
Returns a column by index
columns
const std::vector<std::shared_ptr<ChunkedArray>>&
Returns vector of all columns for table
field
std::shared_ptr<Field>
i
int
required
Field index
Returns a column’s field by index
GetColumnByName
std::shared_ptr<ChunkedArray>
name
const std::string&
required
Field name
Returns a column by name, or nullptr if not found

Operations

Slice
std::shared_ptr<Table>
offset
int64_t
required
Index of the first row in the slice
length
int64_t
Number of rows in the slice
Constructs a zero-copy slice of the table with the indicated offset and length
AddColumn
Result<std::shared_ptr<Table>>
i
int
required
Position to insert column
field
std::shared_ptr<Field>
required
Field metadata for the column
column
std::shared_ptr<ChunkedArray>
required
Column data
Adds a column to the table at position i, producing a new Table
RemoveColumn
Result<std::shared_ptr<Table>>
i
int
required
Column index to remove
Removes a column from the table, producing a new Table
SetColumn
Result<std::shared_ptr<Table>>
i
int
required
Column index to replace
field
std::shared_ptr<Field>
required
Field metadata for the column
column
std::shared_ptr<ChunkedArray>
required
Column data
Replaces a column in the table, producing a new Table
RenameColumns
Result<std::shared_ptr<Table>>
names
const std::vector<std::string>&
required
New column names
Renames columns with provided names
SelectColumns
Result<std::shared_ptr<Table>>
indices
const std::vector<int>&
required
Column indices to select
Returns a new table with specified columns
Flatten
Result<std::shared_ptr<Table>>
pool
MemoryPool*
Memory pool for allocations
Flattens the table, producing a new Table. Any column with a struct type will be flattened into multiple columns
CombineChunks
Result<std::shared_ptr<Table>>
pool
MemoryPool*
Memory pool for allocations
Makes a new table by combining the chunks. All chunks in each ChunkedArray are concatenated into zero or one chunk. Binary columns may have multiple chunks to avoid buffer overflow
CombineChunksToBatch
Result<std::shared_ptr<RecordBatch>>
pool
MemoryPool*
Memory pool for allocations
Makes a new record batch by combining the chunks. All chunks are concatenated into a single chunk

Validation

Validate
Status
Performs cheap validation checks to determine obvious inconsistencies within the table’s schema and internal data. O(k*m) where k is the total number of field descendants and m is the number of chunks
ValidateFull
Status
Performs extensive validation checks. O(k*n) where k is the total number of field descendants and n is the number of rows
Equals
bool
other
const Table&
required
Table to compare with
check_metadata
bool
If true, schema metadata will be compared
Determines if two tables are equal

ConcatenateTables

Constructs a new table from multiple input tables.
Result<std::shared_ptr<Table>> ConcatenateTables(
    const std::vector<std::shared_ptr<Table>>& tables,
    ConcatenateTablesOptions options = ConcatenateTablesOptions::Defaults(),
    MemoryPool* memory_pool = default_memory_pool())
tables
const std::vector<std::shared_ptr<Table>>&
required
Vector of tables to concatenate
options
ConcatenateTablesOptions
Options for unifying schemas
memory_pool
MemoryPool*
Memory pool for allocations
Tables are concatenated in order and row order within tables is preserved. The new table is assembled from existing column chunks without copying if schemas are identical.

ConcatenateTablesOptions

struct ConcatenateTablesOptions {
  bool unify_schemas = false;
  Field::MergeOptions field_merge_options = Field::MergeOptions::Defaults();
}
unify_schemas
bool
If true, schemas will be unified with fields of the same name being merged. Each table will be promoted to the unified schema before concatenation
field_merge_options
Field::MergeOptions
Options to control how fields are merged when unifying schemas. Ignored if unify_schemas is false

TableBatchReader

Computes a stream of record batches from a (possibly chunked) Table.
class TableBatchReader : public RecordBatchReader
TableBatchReader
table
const Table&
required
Table to read from
Constructs a TableBatchReader for the given table. The conversion is zero-copy: each record batch is a view over a slice of the table’s columns
set_chunksize
void
chunksize
int64_t
required
Maximum number of rows per batch
Sets the desired maximum number of rows for record batches. Actual batches may be smaller depending on chunking characteristics
out
std::shared_ptr<RecordBatch>*
required
Output parameter for the next batch
Reads the next record batch from the table

Build docs developers (and LLMs) love