PCA
Principal Component Analysis (PCA). Linear dimensionality reduction using Singular Value Decomposition (SVD) to project data to a lower dimensional space. Algorithm:- Center the data by subtracting the mean
- Compute SVD: X = U * Σ * V^T
- Principal components are columns of V
- Transform data by projecting onto principal components
Constructor
Number of components to keep. If undefined, keeps min(n_samples, n_features).
Whether to whiten the data. When true, the components are divided by the square root of the explained variance, ensuring unit variance.
Methods
fit
Training data of shape (n_samples, n_features)
transform
Data of shape (n_samples, n_features)
fitTransform
inverseTransform
Transformed data of shape (n_samples, n_components)
Properties
Principal components of shape (n_components, n_features)
Amount of variance explained by each component
Percentage of variance explained by each component
Example
TSNE
t-Distributed Stochastic Neighbor Embedding (t-SNE). A nonlinear dimensionality reduction technique for embedding high-dimensional data into a low-dimensional space (typically 2D or 3D) for visualization. Algorithm: Exact t-SNE with an optional sampling-based approximation- Computes pairwise affinities in high-dimensional space using Gaussian kernel (exact)
- Computes pairwise affinities in low-dimensional space using Student-t distribution
- Minimizes KL divergence between the two distributions
method: "approximate" (sampled neighbors + negative sampling) or reduce samples.
Constructor
Number of dimensions in the embedding (typically 2 or 3).
Perplexity parameter (related to number of nearest neighbors). Should be between 5 and 50.
Learning rate for gradient descent.
Number of iterations.
Early exaggeration factor. Helps form tight clusters.
Number of iterations with early exaggeration.
Random seed for reproducibility.
Minimum gradient norm for convergence.
Method for computing affinities: ‘exact’ (full pairwise) or ‘approximate’ (sampling for large datasets).
Maximum samples allowed for exact mode before requiring approximate.
Number of neighbors to sample per point in approximate mode. Default: max(5, floor(perplexity * 3)).
Number of negative samples per point in approximate mode. Default: max(10, floor(perplexity * 2)).
Methods
fit
transform
fitTransform
Training data of shape (n_samples, n_features)
InvalidParameterError if perplexity >= n_samples or if exact mode used with too many samples
Properties
The fitted embedding after calling fit or fitTransform.