Skip to main content
The DataFrame module provides pandas-like data structures for working with structured, tabular data in TypeScript. It offers intuitive APIs for data cleaning, transformation, aggregation, and analysis.

Overview

DataFrame is a two-dimensional labeled data structure, similar to a spreadsheet or SQL table. It provides:
  • Labeled Indexing: Access data by row and column labels
  • Heterogeneous Data: Store different data types in different columns
  • Flexible Reshaping: Pivot, melt, merge, and group operations
  • Data Cleaning: Handle missing values, filter, and transform data
  • Aggregation: Group by columns and compute statistics

Key Features

Intuitive API

Familiar pandas-like interface for TypeScript developers.

Type Safety

Full TypeScript support with type inference for columns.

Efficient Storage

Columnar storage for fast column operations.

Rich Operations

Filter, sort, group, merge, and aggregate with ease.

Basic Usage

Creating DataFrames

import { DataFrame } from 'deepbox/dataframe';

// From object with arrays
const df = new DataFrame({
  name: ['Alice', 'Bob', 'Charlie', 'Diana'],
  age: [25, 30, 35, 28],
  score: [85.5, 92.0, 78.5, 88.0],
  city: ['NYC', 'LA', 'Chicago', 'NYC']
});

console.log(df.shape);    // [4, 4]
console.log(df.columns);  // ['name', 'age', 'score', 'city']

Custom Index

const df = new DataFrame(
  {
    temperature: [72, 75, 68, 70],
    humidity: [65, 70, 60, 68]
  },
  {
    index: ['Mon', 'Tue', 'Wed', 'Thu']
  }
);

Core Operations

Selecting Data

import { DataFrame, Series } from 'deepbox/dataframe';

const df = new DataFrame({
  name: ['Alice', 'Bob', 'Charlie'],
  age: [25, 30, 35],
  score: [85, 92, 78]
});

// Select single column (returns Series)
const ages: Series = df.get('age');

// Select multiple columns
const subset = df.select(['name', 'score']);

// Select rows by position
const firstRow = df.iloc(0);

// Select rows by label (if custom index)
const labeled = df.loc('row_label');

// Select specific cell
const value = df.at(1, 'name');  // 'Bob'

Filtering Rows

// Filter by condition
const adults = df.filter((row) => row.age >= 30);

// Multiple conditions
const filtered = df.filter((row) => 
  row.age > 25 && row.score >= 80
);

// Drop rows with missing values
const clean = df.dropna();

Sorting

// Sort by single column
const sorted1 = df.sort('age');

// Sort descending
const sorted2 = df.sort('score', false);

// Sort by multiple columns
const sorted3 = df.sortBy(['city', 'age']);

Adding and Modifying Data

// Add new column
df.set('grade', ['A', 'A', 'B']);

// Modify existing column
df.set('age', [26, 31, 36]);

// Add computed column
const scores = df.get('score').values;
df.set('normalized_score', scores.map(s => s / 100));

// Drop column
df.drop('grade');

Aggregation and Grouping

GroupBy Operations

const df = new DataFrame({
  category: ['A', 'B', 'A', 'B', 'A'],
  value: [10, 20, 30, 40, 50],
  count: [1, 2, 3, 4, 5]
});

// Group by category and aggregate
const grouped = df.groupBy('category');
const result = grouped.agg('sum');

// Multiple aggregations
const stats = grouped.agg((values) => ({
  sum: values.reduce((a, b) => a + b, 0),
  mean: values.reduce((a, b) => a + b, 0) / values.length,
  count: values.length
}));

Built-in Aggregations

// Compute statistics
const meanAge = df.mean('age');
const maxScore = df.max('score');
const minAge = df.min('age');

// Aggregate entire DataFrame
const allStats = df.describe();

Series Operations

A Series is a one-dimensional labeled array, representing a single column:
import { Series } from 'deepbox/dataframe';

// Create series
const s = new Series([1, 2, 3, 4], { name: 'values' });

// Access by index
console.log(s.iloc(0));  // 1

// Get underlying data
console.log(s.values);   // [1, 2, 3, 4]

// Series from DataFrame column
const ages = df.get('age');
console.log(ages.mean());
console.log(ages.std());

Advanced Operations

Merging DataFrames

const df1 = new DataFrame({
  id: [1, 2, 3],
  name: ['Alice', 'Bob', 'Charlie']
});

const df2 = new DataFrame({
  id: [1, 2, 4],
  score: [85, 92, 78]
});

// Inner join
const merged = df1.merge(df2, { on: 'id', how: 'inner' });

// Left join
const leftJoin = df1.merge(df2, { on: 'id', how: 'left' });

Reshaping Data

// Pivot table
const pivoted = df.pivot({
  index: 'date',
  columns: 'category',
  values: 'sales'
});

// Melt (unpivot)
const melted = df.melt({
  idVars: ['id', 'name'],
  valueVars: ['score1', 'score2']
});

Data Cleaning

// Fill missing values
const filled = df.fillna(0);

// Drop rows with any missing values
const dropMissing = df.dropna();

// Drop duplicate rows
const unique = df.dropDuplicates();

// Replace values
const replaced = df.replace('NYC', 'New York');

Use Cases

Explore and analyze structured datasets:
import { DataFrame } from 'deepbox/dataframe';

const df = new DataFrame({
  date: ['2024-01', '2024-02', '2024-03'],
  revenue: [10000, 12000, 15000],
  expenses: [8000, 9000, 10000]
});

// Calculate profit
df.set('profit', 
  df.get('revenue').values.map((r, i) => 
    r - df.get('expenses').values[i]
  )
);

console.log(df.describe());
Extract, transform, and load data:
// Load data
const raw = new DataFrame(rawData);

// Transform
const cleaned = raw
  .dropna()
  .filter(row => row.value > 0)
  .sort('timestamp');

// Aggregate
const summary = cleaned
  .groupBy('category')
  .agg('sum');
Prepare data for ML models:
import { DataFrame } from 'deepbox/dataframe';
import { tensor } from 'deepbox/ndarray';

const df = new DataFrame(data);

// Select features
const features = df.select(['feature1', 'feature2', 'feature3']);
const target = df.get('label');

// Convert to tensors
const X = tensor(features.values);
const y = tensor(target.values);

API Reference

DataFrame Methods

Selection
  • get(column) - Get column as Series
  • select(columns) - Select multiple columns
  • iloc(index) - Select by position
  • loc(label) - Select by label
  • at(row, col) - Get single value
Modification
  • set(column, values) - Add/update column
  • drop(column) - Remove column
  • rename(mapping) - Rename columns
Filtering & Sorting
  • filter(predicate) - Filter rows
  • sort(column, ascending) - Sort by column
  • sortBy(columns) - Sort by multiple columns
  • head(n) - First n rows
  • tail(n) - Last n rows
Aggregation
  • groupBy(column) - Group by column
  • agg(function) - Aggregate groups
  • mean(column), sum(column), max(column), min(column)
  • describe() - Summary statistics
Reshaping
  • merge(other, options) - Join DataFrames
  • pivot(options) - Pivot table
  • melt(options) - Unpivot
Cleaning
  • dropna() - Remove missing values
  • fillna(value) - Fill missing values
  • dropDuplicates() - Remove duplicates
  • replace(old, new) - Replace values

Series Methods

  • iloc(index) - Access by position
  • loc(label) - Access by label
  • values - Get underlying array
  • mean(), sum(), std(), min(), max()

Performance Tips

Use columnar operations instead of row-by-row iteration. Operations like groupBy and agg are optimized for performance.
Filter data early in your pipeline to reduce the size of DataFrames being processed.
Avoid creating too many intermediate DataFrames. Chain operations when possible.

NDArray

Convert DataFrames to tensors

Statistics

Statistical analysis functions

Preprocessing

Data preparation for ML

Learn More

API Reference

Complete API documentation

Tutorial

Learn DataFrame operations

Build docs developers (and LLMs) love