Table Classes
Table classes represent tabular data as a collection of chunked columnar arrays.
Table
Logical table as a sequence of chunked arrays.
Construction
schema
std::shared_ptr<Schema>
required
The table schema (column types)
columns
std::vector<std::shared_ptr<ChunkedArray>>
required
The table’s columns as chunked arrays
Number of rows in table, -1 (default) to infer from columns
Constructs a Table from schema and columns. If columns is zero-length, the table’s number of rows is zero
MakeEmpty
Result<std::shared_ptr<Table>>
schema
std::shared_ptr<Schema>
required
The schema of the empty table
Memory pool for allocations
Creates an empty Table with a single empty chunk per column
FromRecordBatches
Result<std::shared_ptr<Table>>
schema
std::shared_ptr<Schema>
required
Schema for each batch
batches
const std::vector<std::shared_ptr<RecordBatch>>&
required
Vector of record batches
Constructs a Table from RecordBatches using supplied schema. May have zero record batches
FromChunkedStructArray
Result<std::shared_ptr<Table>>
array
const std::shared_ptr<ChunkedArray>&
required
A chunked StructArray
Constructs a Table from a chunked StructArray. One column will be produced for each field
Properties
Returns the number of rows (equal to each column’s logical length)
Returns the number of columns in the table
column
std::shared_ptr<ChunkedArray>
Returns a column by index
columns
const std::vector<std::shared_ptr<ChunkedArray>>&
Returns vector of all columns for table
Returns a column’s field by index
GetColumnByName
std::shared_ptr<ChunkedArray>
name
const std::string&
required
Field name
Returns a column by name, or nullptr if not found
Operations
Index of the first row in the slice
Number of rows in the slice
Constructs a zero-copy slice of the table with the indicated offset and length
AddColumn
Result<std::shared_ptr<Table>>
Position to insert column
field
std::shared_ptr<Field>
required
Field metadata for the column
column
std::shared_ptr<ChunkedArray>
required
Column data
Adds a column to the table at position i, producing a new Table
RemoveColumn
Result<std::shared_ptr<Table>>
Removes a column from the table, producing a new Table
SetColumn
Result<std::shared_ptr<Table>>
field
std::shared_ptr<Field>
required
Field metadata for the column
column
std::shared_ptr<ChunkedArray>
required
Column data
Replaces a column in the table, producing a new Table
RenameColumns
Result<std::shared_ptr<Table>>
names
const std::vector<std::string>&
required
New column names
Renames columns with provided names
SelectColumns
Result<std::shared_ptr<Table>>
indices
const std::vector<int>&
required
Column indices to select
Returns a new table with specified columns
Flatten
Result<std::shared_ptr<Table>>
Memory pool for allocations
Flattens the table, producing a new Table. Any column with a struct type will be flattened into multiple columns
CombineChunks
Result<std::shared_ptr<Table>>
Memory pool for allocations
Makes a new table by combining the chunks. All chunks in each ChunkedArray are concatenated into zero or one chunk. Binary columns may have multiple chunks to avoid buffer overflow
CombineChunksToBatch
Result<std::shared_ptr<RecordBatch>>
Memory pool for allocations
Makes a new record batch by combining the chunks. All chunks are concatenated into a single chunk
Validation
Performs cheap validation checks to determine obvious inconsistencies within the table’s schema and internal data. O(k*m) where k is the total number of field descendants and m is the number of chunks
Performs extensive validation checks. O(k*n) where k is the total number of field descendants and n is the number of rows
If true, schema metadata will be compared
Determines if two tables are equal
ConcatenateTables
Constructs a new table from multiple input tables.
Result<std::shared_ptr<Table>> ConcatenateTables(
const std::vector<std::shared_ptr<Table>>& tables,
ConcatenateTablesOptions options = ConcatenateTablesOptions::Defaults(),
MemoryPool* memory_pool = default_memory_pool())
tables
const std::vector<std::shared_ptr<Table>>&
required
Vector of tables to concatenate
Options for unifying schemas
Memory pool for allocations
Tables are concatenated in order and row order within tables is preserved. The new table is assembled from existing column chunks without copying if schemas are identical.
ConcatenateTablesOptions
struct ConcatenateTablesOptions {
bool unify_schemas = false;
Field::MergeOptions field_merge_options = Field::MergeOptions::Defaults();
}
If true, schemas will be unified with fields of the same name being merged. Each table will be promoted to the unified schema before concatenation
Options to control how fields are merged when unifying schemas. Ignored if unify_schemas is false
TableBatchReader
Computes a stream of record batches from a (possibly chunked) Table.
class TableBatchReader : public RecordBatchReader
Constructs a TableBatchReader for the given table. The conversion is zero-copy: each record batch is a view over a slice of the table’s columns
Maximum number of rows per batch
Sets the desired maximum number of rows for record batches. Actual batches may be smaller depending on chunking characteristics
out
std::shared_ptr<RecordBatch>*
required
Output parameter for the next batch
Reads the next record batch from the table