Encoders - Deepbox

Encoders transform categorical features (strings, labels) into numeric representations suitable for machine learning.

LabelEncoder

Encode target labels with values between 0 and n_classes-1. Time Complexity:

fit: O(n) where n is number of samples
transform: O(n) with O(1) lookup per sample

Space Complexity: O(k) where k is number of unique classes

import { LabelEncoder } from 'deepbox/preprocess';
import { tensor } from 'deepbox/ndarray';

const y = tensor(['cat', 'dog', 'cat', 'bird']);
const encoder = new LabelEncoder();
encoder.fit(y);
const yEncoded = encoder.transform(y);  // [1, 2, 1, 0]
const yDecoded = encoder.inverseTransform(yEncoded); // ['cat', 'dog', 'cat', 'bird']

Methods

fit

(y: Tensor | Array) => this

Learn unique classes from labels.Parameters:

y - Target labels (1D tensor or array of strings/numbers)

Returns: self for method chaining

transform

(y: Tensor | Array) => Tensor

Transform labels to normalized encoding [0, n_classes-1].Returns: Integer tensor with encoded labelsThrows: InvalidParameterError if label not seen during fit

fitTransform

(y: Tensor | Array) => Tensor

Fit to data and transform in one step.

inverseTransform

(y: Tensor | Array) => Tensor

Transform integer labels back to original encoding.

Attributes

After fitting:

classes_ - Unique classes in sorted order
classToIndex_ - Map from class to integer index

OneHotEncoder

Encode categorical features as one-hot numeric array. Time Complexity:

fit: O(n*m) where n=samples, m=features
transform: O(nmk) where k=avg categories per feature

Space Complexity:

Dense: O(n * Σk_i) where k_i is unique categories for feature i
Sparse: O(nnz) number of non-zero elements

import { OneHotEncoder } from 'deepbox/preprocess';
import { tensor } from 'deepbox/ndarray';

const X = tensor([['red', 'S'], ['blue', 'M'], ['red', 'L']]);
const encoder = new OneHotEncoder({ sparse: false });
encoder.fit(X);
const encoded = encoder.transform(X);
// Result: [[1,0,1,0,0], [0,1,0,1,0], [1,0,0,0,1]]

Constructor

new OneHotEncoder(options?: {
  sparse?: boolean;                     // Return CSRMatrix if true (default: false)
  sparseOutput?: boolean;               // Alias for sparse
  handleUnknown?: 'error' | 'ignore';   // How to handle unknown categories (default: 'error')
  drop?: 'first' | 'if_binary' | null;  // Drop policy to avoid collinearity
  categories?: 'auto' | Category[][];   // Explicit categories per feature
})

Methods

fit

(X: Tensor | Array[][]) => this

Learn unique categories for each feature.Parameters:

X - Training data (2D tensor or array)

Returns: self for method chaining

transform

(X: Tensor | Array[][]) => Tensor | CSRMatrix

Transform categorical features to one-hot encoding.Returns: Binary matrix (dense Tensor or sparse CSRMatrix)

sparse=false: Returns dense Tensor
sparse=true: Returns CSRMatrix for memory efficiency

fitTransform

(X: Tensor | Array[][]) => Tensor | CSRMatrix

Fit and transform in one step.

inverseTransform

(X: Tensor | CSRMatrix) => Tensor

Transform one-hot encoding back to original categories.

Attributes

After fitting:

categories_ - Unique categories for each feature
dropIndices_ - Index of dropped category per feature (if drop is set)

OrdinalEncoder

Encode categorical features as integer array. Time Complexity:

fit: O(nmlog(k)) where n=samples, m=features, k=avg categories
transform: O(n*m) with O(1) map lookup

Space Complexity: O(m*k) where m=features, k=avg categories per feature

import { OrdinalEncoder } from 'deepbox/preprocess';

const X = tensor([['low', 'red'], ['high', 'blue'], ['medium', 'red']]);
const encoder = new OrdinalEncoder();
encoder.fit(X);
const encoded = encoder.transform(X);
// Result: [[1, 1], [0, 0], [2, 1]] (alphabetically sorted)

Constructor

new OrdinalEncoder(options?: {
  handleUnknown?: 'error' | 'useEncodedValue';  // How to handle unknown categories
  unknownValue?: number;                        // Value for unknown (default: -1)
  categories?: 'auto' | Category[][];           // Explicit categories per feature
})

Methods

fit

(X: Tensor | Array[][]) => this

Learn unique categories and their ordering for each feature.

transform

(X: Tensor | Array[][]) => Tensor

Transform categorical features to ordinal integers [0, n_categories-1].

fitTransform

(X: Tensor | Array[][]) => Tensor

Fit and transform in one step.

inverseTransform

(X: Tensor | Array[][]) => Tensor

Transform ordinal integers back to original categories.

Attributes

After fitting:

categories_ - Sorted unique categories for each feature
categoryToIndex_ - Map from category to index for each feature

LabelBinarizer

Binarize labels in a one-vs-all fashion. Time Complexity:

fit: O(n) where n is number of samples
transform: O(n*k) where k is number of classes

Space Complexity: O(n*k) for the output matrix

import { LabelBinarizer } from 'deepbox/preprocess';

const y = tensor([0, 1, 2, 0, 1]);
const binarizer = new LabelBinarizer();
const yBin = binarizer.fitTransform(y);
// Result shape: [5, 3] with one-hot encoding

Constructor

new LabelBinarizer(options?: {
  posLabel?: number;       // Value for positive class (default: 1)
  negLabel?: number;       // Value for negative class (default: 0)
  sparse?: boolean;        // Return CSRMatrix if true (default: false)
  sparseOutput?: boolean;  // Alias for sparse
})

Methods

fit

(y: Tensor | Array) => this

Learn unique classes from labels.

transform

(y: Tensor | Array) => Tensor | CSRMatrix

Transform labels to binary matrix.Each label is converted to a binary vector with:

posLabel (default 1) at the class position
negLabel (default 0) elsewhere

fitTransform

(y: Tensor | Array) => Tensor | CSRMatrix

Fit and transform in one step.

inverseTransform

(Y: Tensor | CSRMatrix) => Tensor

Transform binary matrix back to labels.Finds the column with maximum value for each row.

Attributes

After fitting:

classes_ - Unique classes in sorted order

MultiLabelBinarizer

Transform multi-label classification data to binary format. Handles cases where each sample can belong to multiple classes simultaneously. Time Complexity:

fit: O(n*k) where n=samples, k=avg labels per sample
transform: O(nkc) where c=total unique classes

Space Complexity: O(n*c) for the output matrix

import { MultiLabelBinarizer } from 'deepbox/preprocess';

const y = [['sci-fi', 'action'], ['comedy'], ['action', 'drama']];
const binarizer = new MultiLabelBinarizer();
const yBin = binarizer.fitTransform(y);
// Each row can have multiple 1s

Constructor

new MultiLabelBinarizer(options?: {
  sparse?: boolean;             // Return CSRMatrix if true (default: false)
  sparseOutput?: boolean;       // Alias for sparse
  classes?: Category[];         // Explicit class ordering
})

Methods

fit

(y: Category[][]) => this

Learn all unique classes across all samples.Parameters:

y - Array of label sets (each element is an array of labels)

transform

(y: Category[][]) => Tensor | CSRMatrix

Transform label sets to binary matrix.Each row can have multiple 1s (one per active label).

fitTransform

(y: Category[][]) => Tensor | CSRMatrix

Fit and transform in one step.

inverseTransform

(Y: Tensor | CSRMatrix) => Category[][]

Transform binary matrix back to label sets.Finds all active (1) columns for each row.Returns: Array of label sets (one per sample)

Attributes

After fitting:

classes_ - All unique classes in sorted order

Type Definitions

// Category values can be strings, numbers, or bigints
type Category = string | number | bigint;

// Encoder input types
type EncoderInput1D = Tensor | readonly (string | number | bigint | boolean)[];
type EncoderInput2D = Tensor | readonly (readonly (string | number | bigint)[])[];

Examples

Text Label Encoding

import { LabelEncoder } from 'deepbox/preprocess';
import { tensor } from 'deepbox/ndarray';

const labels = tensor(['positive', 'negative', 'neutral', 'positive', 'negative']);
const encoder = new LabelEncoder();
const encoded = encoder.fitTransform(labels);
// [2, 0, 1, 2, 0] (alphabetically sorted)

const decoded = encoder.inverseTransform(encoded);
// ['positive', 'negative', 'neutral', 'positive', 'negative']

Multi-Feature One-Hot Encoding

import { OneHotEncoder } from 'deepbox/preprocess';
import { tensor } from 'deepbox/ndarray';

const features = tensor([
  ['red', 'small'],
  ['blue', 'large'],
  ['red', 'medium'],
  ['green', 'small']
]);

const encoder = new OneHotEncoder({ sparse: false });
const encoded = encoder.fitTransform(features);
// Shape: [4, 5] - one column per unique value across all features

const original = encoder.inverseTransform(encoded);
// Returns original categorical data

Multi-Label Classification

import { MultiLabelBinarizer } from 'deepbox/preprocess';

const movieGenres = [
  ['action', 'sci-fi'],
  ['comedy', 'romance'],
  ['action', 'thriller'],
  ['sci-fi']
];

const binarizer = new MultiLabelBinarizer();
const encoded = binarizer.fitTransform(movieGenres);
// Shape: [4, 5] - columns for: action, comedy, romance, sci-fi, thriller
// Each row can have multiple 1s

const decoded = binarizer.inverseTransform(encoded);
// Returns original label sets

Ordinal Encoding with Unknown Handling

import { OrdinalEncoder } from 'deepbox/preprocess';
import { tensor } from 'deepbox/ndarray';

const sizes = tensor([['S'], ['M'], ['L'], ['XL']]);
const encoder = new OrdinalEncoder({
  handleUnknown: 'useEncodedValue',
  unknownValue: -1
});
encoder.fit(sizes);

const testSizes = tensor([['M'], ['XXL'], ['S']]);
const encoded = encoder.transform(testSizes);
// [1, -1, 0] - 'XXL' is encoded as -1 (unknown)

Sparse Encoding for Memory Efficiency

import { OneHotEncoder } from 'deepbox/preprocess';
import { tensor } from 'deepbox/ndarray';

// High cardinality categorical data
const userIds = tensor([["user_1"], ["user_2"], ["user_3"]]);

const encoder = new OneHotEncoder({ sparse: true });
const encoded = encoder.fitTransform(userIds);
// Returns CSRMatrix instead of dense tensor
// Much more memory efficient for high cardinality features

When to Use Each Encoder

LabelEncoder

Use for: Target labels in classification. Creates simple integer mapping [0, n_classes-1].

OneHotEncoder

Use for: Categorical features with no ordinal relationship. Creates binary columns (can return sparse matrices).

OrdinalEncoder

Use for: Categorical features with ordinal relationship (e.g., low/medium/high). Maintains single column per feature.

LabelBinarizer

Use for: Single-label classification targets. Creates binary matrix representation.

MultiLabelBinarizer

Use for: Multi-label classification (samples can have multiple labels). Each row can have multiple active columns.

NDArray

DataFrame

Linear Algebra

Statistics

Machine Learning

Neural Networks

Optimization

Preprocessing

Metrics

Random

Plotting

Datasets

​LabelEncoder

​Methods

​Attributes

​OneHotEncoder

​Constructor

​Methods

​Attributes

​OrdinalEncoder

​Constructor

​Methods

​Attributes

​LabelBinarizer

​Constructor

​Methods

​Attributes

​MultiLabelBinarizer

​Constructor

​Methods

​Attributes

​Type Definitions

​Examples

​Text Label Encoding

​Multi-Feature One-Hot Encoding

​Multi-Label Classification

​Ordinal Encoding with Unknown Handling

​Sparse Encoding for Memory Efficiency

​When to Use Each Encoder

LabelEncoder

OneHotEncoder

OrdinalEncoder

LabelBinarizer

MultiLabelBinarizer

Build docs developers (and LLMs) love

LabelEncoder

Methods

Attributes

OneHotEncoder

Constructor

Methods

Attributes

OrdinalEncoder

Constructor

Methods

Attributes

LabelBinarizer

Constructor

Methods

Attributes

MultiLabelBinarizer

Constructor

Methods

Attributes

Type Definitions

Examples

Text Label Encoding

Multi-Feature One-Hot Encoding

Multi-Label Classification

Ordinal Encoding with Unknown Handling

Sparse Encoding for Memory Efficiency

When to Use Each Encoder