Table
A Table is a two-dimensional dataset with chunked arrays for columns.table()
Create a pyarrow.Table from Python data.Dictionary of column names to arrays/lists, list of arrays, or pandas DataFrame.
If provided, dictates the schema of the table. Otherwise inferred.
Optional metadata for the schema.
A Table constructed from the input data.
Properties
schema
Return the table’s schema.The table’s schema.
num_rows
Number of rows in the table.The number of rows.
num_columns
Number of columns in the table.The number of columns.
column_names
List of column names.The column names as a list.
columns
List of all columns as ChunkedArray objects.The columns as ChunkedArray objects.
Methods
column()
Select a column by name or index.Column index (int) or name (str).
The selected column.
select()
Select columns by names.List of column names to select.
A new table with only the selected columns.
slice()
Compute a zero-copy slice of this table.Offset from start of table to slice.
Length of slice (default is until end of table).
A zero-copy slice of the table.
filter()
Filter rows of the table using a boolean selection filter.Boolean array or expression to filter rows.
How to handle null values in mask. Options: ‘drop’, ‘emit_null’.
A filtered table.
take()
Select rows by indices.Indices of rows to select.
A table with selected rows.
sort_by()
Sort table by one or more columns.Column name or list of (name, order) tuples where order is ‘ascending’ or ‘descending’.
A sorted table.
group_by()
Group table by one or more columns.Column name(s) to group by.
A grouped table object that can be aggregated.
to_pydict()
Convert table to a Python dictionary.A dictionary with column names as keys and Python lists as values.
to_pandas()
Convert table to a pandas DataFrame.If True, destroy the source table to save memory.
A pandas DataFrame with the table data.
add_column()
Add a column to the table.Index where to insert the column.
Column name or Field object.
Column data.
A new table with the added column.
remove_column()
Remove a column from the table.Index of column to remove.
A new table without the specified column.
rename_columns()
Rename columns in the table.New column names.
A new table with renamed columns.
RecordBatch
A RecordBatch is a collection of equal-length arrays.record_batch()
Create a RecordBatch from arrays or a dictionary.List of arrays or dictionary of column names to arrays.
Schema defining the batch structure.
Optional metadata.
A RecordBatch with the specified data.
Properties
schema
The schema of the record batch.The batch’s schema.
num_rows
Number of rows in the batch.The number of rows.
num_columns
Number of columns in the batch.The number of columns.
Methods
to_pydict()
Convert to a Python dictionary.Dictionary with column names as keys.
to_pandas()
Convert to a pandas DataFrame.A pandas DataFrame.
Utility Functions
concat_tables()
Concatenate multiple tables into one.Tables to concatenate.
If True, promote types to the widest type.
Memory pool for allocation.
Concatenated table.
concat_batches()
Concatenate multiple record batches.Record batches to concatenate.
Memory pool for allocation.
Concatenated record batch.
RecordBatchReader
Streaming reader for record batches.from_batches()
Create a reader from a list of record batches.Schema of the batches.
Batches to read.
A batch reader.
read_all()
Read all batches and return as a Table.All batches combined into a table.
read_next_batch()
Read the next batch.The next batch, or raises StopIteration if exhausted.