Skip to main content

Overview

This glossary defines key terms and concepts used throughout the Data Science Bootcamp. Terms are organized alphabetically for quick reference.
Use Ctrl+F (or Cmd+F on Mac) to quickly search for specific terms.

A

A mathematical function applied to a neuron’s output in a neural network.Purpose: Introduces non-linearity, enabling networks to learn complex patterns.Common Types:
  • ReLU (Rectified Linear Unit): f(x) = max(0, x)
  • Sigmoid: f(x) = 1 / (1 + e^(-x))
  • Tanh: f(x) = tanh(x)
  • Softmax: Used for multi-class classification output
Module: A8 (Deep Learning)
A step-by-step procedure for solving a problem or performing a computation.In Data Science: Algorithms process data to learn patterns (machine learning algorithms) or perform calculations (sorting, searching algorithms).Examples: Linear regression, decision trees, K-means clusteringModules: A5-A8
A data structure that holds multiple values of the same type in a grid-like structure.In NumPy: An n-dimensional array (ndarray) is the fundamental data structure.Example:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])  # 1D array
matrix = np.array([[1, 2], [3, 4]])  # 2D array
Properties: Shape, dtype (data type), sizeModule: A3 (NumPy)

B

The algorithm used to calculate gradients in neural networks for updating weights.Process: Computes the gradient of the loss function with respect to each weight by applying the chain rule, propagating errors backwards through the network.Purpose: Enables neural networks to learn from mistakesModule: A8 (Deep Learning)
The number of training samples processed before updating model parameters.Trade-offs:
  • Large batch: Faster training, more memory, less noise
  • Small batch: Slower training, less memory, more exploration
Common Values: 32, 64, 128, 256Module: A8 (Deep Learning)
Systematic error that causes predictions to consistently deviate from true values.Example: A model trained only on data from one demographic might be biased against others.Related: Bias-Variance TradeoffModules: A6-A7
An additional parameter in neural network layers that shifts the activation function.Formula: output = weights * input + biasPurpose: Allows the model to fit data that doesn’t pass through the originModule: A8
NumPy’s mechanism for performing operations on arrays of different shapes.Example:
import numpy as np
arr = np.array([1, 2, 3])
result = arr + 10  # Adds 10 to each element
# Result: [11, 12, 13]
Module: A3 (NumPy)

C

A supervised learning task where the goal is to predict discrete categories or classes.Types:
  • Binary: Two classes (e.g., spam/not spam)
  • Multiclass: More than two classes (e.g., image classification)
Algorithms: Logistic regression, KNN, decision trees, neural networksModule: A6-A8
An unsupervised learning technique that groups similar data points together.Common Algorithms:
  • K-Means: Partitions data into K clusters
  • Hierarchical: Creates a tree of clusters
  • DBSCAN: Density-based clustering
Use Cases: Customer segmentation, anomaly detection, data explorationModule: A7 (Advanced ML)
A table showing the performance of a classification model.Structure:
            Predicted
          Pos    Neg
Actual Pos  TP     FN
       Neg  FP     TN
  • TP (True Positive): Correctly predicted positive
  • TN (True Negative): Correctly predicted negative
  • FP (False Positive): Incorrectly predicted positive (Type I error)
  • FN (False Negative): Incorrectly predicted negative (Type II error)
Derived Metrics: Accuracy, precision, recall, F1-scoreModule: A6
A technique for assessing model performance by splitting data into multiple train/test sets.K-Fold Cross-Validation:
  1. Split data into K equal parts (folds)
  2. Train on K-1 folds, test on the remaining fold
  3. Repeat K times, using each fold as test set once
  4. Average the results
Purpose: Provides more reliable performance estimates, reduces overfittingCommon Value: K=5 or K=10Module: A6 (Machine Learning)

D

Pandas’ primary data structure: a 2-dimensional labeled table with columns of potentially different types.Key Features:
  • Row and column labels (index)
  • Heterogeneous data types
  • Built-in methods for data manipulation
  • Easy filtering, grouping, and aggregation
Example:
import pandas as pd
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['NYC', 'LA', 'Chicago']
})
Module: A3 (Primary focus), used throughout A4-A8
Techniques to artificially increase training data by creating modified versions of existing data.For Images: Rotation, flipping, cropping, color adjustmentPurpose: Reduces overfitting, improves model generalizationModule: A8 (Deep Learning)
The process of detecting and correcting (or removing) corrupt or inaccurate data.Common Tasks:
  • Handling missing values
  • Removing duplicates
  • Correcting data types
  • Fixing inconsistencies
  • Removing outliers
Module: A3 (Primary), A4-A6
A tree-like model that makes decisions by asking a series of questions about features.Structure:
  • Root node: First decision
  • Internal nodes: Subsequent decisions
  • Leaf nodes: Final predictions
Advantages: Interpretable, handles non-linear relationshipsDisadvantages: Prone to overfittingExtensions: Random Forest, Gradient BoostingModule: A6
A subset of machine learning using neural networks with multiple layers (“deep” networks).Key Concepts:
  • Multiple hidden layers
  • Automatic feature learning
  • Requires large datasets
  • GPU acceleration
Applications: Image recognition, NLP, speech recognitionFrameworks: TensorFlow/Keras, PyTorchModule: A8
Techniques to reduce the number of features while preserving important information.Methods:
  • PCA (Principal Component Analysis): Linear transformation
  • t-SNE: Non-linear, good for visualization
  • UMAP: Non-linear, preserves global structure
Benefits: Faster training, visualization, noise reductionModule: A7

E

The process of analyzing datasets to summarize their main characteristics, often using visualizations.Goals:
  • Understand data structure
  • Detect patterns and anomalies
  • Test assumptions
  • Identify relationships between variables
Common Techniques:
  • Summary statistics (df.describe())
  • Visualizations (histograms, box plots, scatter plots)
  • Correlation analysis
  • Distribution analysis
Module: A4 (Primary focus)
Techniques that combine multiple models to improve prediction accuracy.Types:
  • Bagging: Train models independently on random subsets (e.g., Random Forest)
  • Boosting: Train models sequentially, each correcting errors of previous (e.g., Gradient Boosting)
  • Stacking: Combine predictions from multiple models using a meta-model
Principle: “Wisdom of crowds” - multiple weak learners become a strong learnerModule: A6
One complete pass through the entire training dataset during model training.Example: Training for 10 epochs means the model sees every training sample 10 times.Considerations:
  • Too few: Underfitting
  • Too many: Overfitting
  • Use early stopping to find optimal number
Module: A8 (Deep Learning)

F

An individual measurable property or characteristic of a phenomenon being observed.Also called: Variable, predictor, independent variable, attributeExample: In a house price dataset, features might include: square footage, number of bedrooms, location, ageTypes:
  • Numerical: Continuous (price) or discrete (count)
  • Categorical: Nominal (color) or ordinal (rating)
Modules: All modules A3-A8
The process of creating new features or transforming existing ones to improve model performance.Techniques:
  • Creating interaction terms
  • Binning continuous variables
  • Encoding categorical variables
  • Extracting date/time components
  • Polynomial features
  • Domain-specific transformations
Example:
# Create BMI from height and weight
df['BMI'] = df['weight'] / (df['height'] ** 2)

# Extract month from date
df['month'] = df['date'].dt.month
Module: A6-A7
Normalizing or standardizing features to bring them to a similar scale.Methods:
  • Standardization (Z-score): (x - mean) / std → Mean=0, Std=1
  • Min-Max Normalization: (x - min) / (max - min) → Range [0, 1]
Why Important: Many algorithms (KNN, neural networks) are sensitive to feature scalesExample:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
Module: A6
The harmonic mean of precision and recall, providing a single metric that balances both.Formula: F1 = 2 * (precision * recall) / (precision + recall)Range: 0 to 1 (higher is better)When to Use: Imbalanced datasets where both false positives and false negatives are importantModule: A6

G

An optimization algorithm that iteratively adjusts model parameters to minimize the loss function.Process:
  1. Calculate gradient (slope) of loss function
  2. Update parameters in opposite direction of gradient
  3. Repeat until convergence
Variants:
  • Batch GD: Uses entire dataset
  • Stochastic GD: Uses one sample at a time
  • Mini-batch GD: Uses small batches (most common)
Learning Rate: Controls step size (critical hyperparameter)Module: A6-A8
An exhaustive search over specified parameter values to find the best hyperparameters.Process:
  1. Define parameter grid
  2. Train model for each combination
  3. Use cross-validation to evaluate
  4. Return best parameters
Example:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15]
}

grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
Module: A6-A7

H

A parameter whose value is set before training and controls the learning process.Examples:
  • Learning rate
  • Number of trees in random forest
  • K in K-nearest neighbors
  • Number of epochs
  • Batch size
Contrast with: Model parameters (learned during training, like weights)Tuning Methods: Grid search, random search, Bayesian optimizationModule: A6-A8
A statistical method to make inferences about population parameters based on sample data.Steps:
  1. State null hypothesis (H₀) and alternative hypothesis (H₁)
  2. Choose significance level (α, typically 0.05)
  3. Calculate test statistic
  4. Determine p-value
  5. Make decision: reject or fail to reject H₀
Common Tests: t-test, chi-square test, ANOVAModule: A5 (Statistics)

K

See: Cross-Validation
A clustering algorithm that partitions data into K clusters.Algorithm:
  1. Initialize K cluster centroids randomly
  2. Assign each point to nearest centroid
  3. Recalculate centroids as mean of assigned points
  4. Repeat steps 2-3 until convergence
Choosing K: Elbow method, silhouette scoreLimitations: Assumes spherical clusters, sensitive to initializationModule: A7
A simple algorithm that makes predictions based on the K closest training examples.Classification: Majority vote of K neighborsRegression: Average of K neighborsChoosing K:
  • Small K: More sensitive to noise
  • Large K: Smoother boundaries
  • Odd K (for classification): Avoids ties
Requirement: Feature scaling is criticalModule: A6

L

A hyperparameter that controls how much to update model parameters during training.Impact:
  • Too high: Model may overshoot minimum, fail to converge
  • Too low: Very slow training, may get stuck in local minimum
Typical Values: 0.001, 0.01, 0.1Advanced: Learning rate scheduling, adaptive methods (Adam, RMSprop)Module: A6-A8
A classification algorithm that models the probability of an instance belonging to a class.Output: Probability between 0 and 1 (using sigmoid function)Decision Rule: Predict class 1 if probability > 0.5, else class 0Despite the name: It’s classification, not regressionUse Case: Binary classificationModule: A6
A function that measures how well a model’s predictions match the actual values.Also called: Cost function, objective functionCommon Loss Functions:
  • MSE (Mean Squared Error): For regression
  • Cross-Entropy: For classification
  • MAE (Mean Absolute Error): For regression
Goal: Minimize the loss during trainingModule: A6-A8

M

A field of study that gives computers the ability to learn from data without being explicitly programmed.Types:
  • Supervised: Learn from labeled data (classification, regression)
  • Unsupervised: Find patterns in unlabeled data (clustering, dimensionality reduction)
  • Reinforcement: Learn through interaction and feedback
Module: A6-A8 (primary focus)
A mathematical representation of a real-world process, learned from data.In ML: An algorithm trained on data that can make predictions on new dataLifecycle:
  1. Training: Learn patterns from data
  2. Validation: Tune hyperparameters
  3. Testing: Evaluate final performance
  4. Deployment: Use in production
Module: A6-A8

N

A machine learning model inspired by biological neural networks, consisting of interconnected nodes (neurons) organized in layers.Structure:
  • Input layer: Receives features
  • Hidden layers: Process information
  • Output layer: Produces predictions
Components:
  • Weights: Connection strengths
  • Biases: Offset values
  • Activation functions: Introduce non-linearity
Types: Feedforward, Convolutional (CNN), Recurrent (RNN)Module: A8
See: Feature Scaling
Python library for numerical computing with multi-dimensional arrays.Key Features:
  • Fast array operations
  • Broadcasting
  • Linear algebra functions
  • Random number generation
Foundation for: Pandas, scikit-learn, TensorFlow, PyTorchModule: A3 (primary), used throughout A4-A8

O

A data point that differs significantly from other observations.Detection Methods:
  • Box plots (IQR method)
  • Z-score
  • Isolation Forest
Handling:
  • Remove (if error)
  • Transform (log, square root)
  • Cap (winsorization)
  • Keep (if genuine)
Module: A4 (EDA)
When a model learns training data too well, including noise, resulting in poor generalization to new data.Signs:
  • High training accuracy, low test accuracy
  • Model is too complex
  • Training loss decreases but validation loss increases
Solutions:
  • Collect more data
  • Regularization
  • Reduce model complexity
  • Cross-validation
  • Early stopping
  • Dropout (neural networks)
Module: A6-A8

P

Python library for data manipulation and analysis.Key Data Structures:
  • DataFrame: 2D table
  • Series: 1D array
Capabilities:
  • Data loading (CSV, Excel, SQL, etc.)
  • Data cleaning and transformation
  • Grouping and aggregation
  • Time series analysis
Module: A3 (primary), used throughout A4-A8
A dimensionality reduction technique that transforms data into a new coordinate system.Purpose:
  • Reduce number of features
  • Remove correlations
  • Visualize high-dimensional data
  • Speed up training
Principal Components: New features that capture maximum varianceExample:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
Module: A7
The proportion of positive predictions that are actually correct.Formula: Precision = TP / (TP + FP)Interpretation: “Of all instances we predicted as positive, how many were actually positive?”When Important: When false positives are costly (e.g., spam detection)Related: Recall, F1-scoreModule: A6

R

An ensemble learning method that constructs multiple decision trees and combines their predictions.Process:
  1. Create multiple decision trees on random subsets of data
  2. Each tree uses random subset of features
  3. Aggregate predictions (majority vote or average)
Advantages:
  • Reduces overfitting compared to single decision tree
  • Handles non-linear relationships
  • Feature importance
Module: A6
The proportion of actual positive instances that are correctly identified.Formula: Recall = TP / (TP + FN)Also called: Sensitivity, True Positive RateInterpretation: “Of all actual positive instances, how many did we find?”When Important: When false negatives are costly (e.g., disease detection)Module: A6
A supervised learning task where the goal is to predict continuous numerical values.Types:
  • Linear Regression: Models linear relationships
  • Polynomial Regression: Models non-linear relationships
  • Multiple Regression: Multiple features
Example: Predicting house prices, temperature, stock pricesMetrics: MSE, RMSE, MAE, R²Module: A6
Techniques to prevent overfitting by adding a penalty for model complexity.Types:
  • L1 (Lasso): Adds sum of absolute weights → Sparse models
  • L2 (Ridge): Adds sum of squared weights → Distributes weights
  • Elastic Net: Combination of L1 and L2
  • Dropout: Randomly deactivate neurons (neural networks)
Effect: Simpler, more generalizable modelsModule: A6-A8
A common regression metric that measures average prediction error.Formula: RMSE = sqrt(mean((y_pred - y_true)²))Properties:
  • Same units as target variable
  • Penalizes large errors more than MAE
  • Lower is better
Module: A6

S

Machine learning where the model learns from labeled data (input-output pairs).Tasks:
  • Classification (discrete outputs)
  • Regression (continuous outputs)
Examples: Email spam detection, house price predictionRequirements: Labeled training dataModule: A6 (primary)
Pandas’ 1-dimensional labeled array, capable of holding any data type.Like: A single column of a DataFrame, or NumPy array with labelsExample:
import pandas as pd
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
Module: A3

T

A portion of data held out for final model evaluation.Purpose: Provides unbiased estimate of model performance on unseen dataCritical Rule: Never use test set during model development or hyperparameter tuningTypical Split: 80% train, 20% test (or 70-15-15 train-val-test)Module: A6-A8
The portion of data used to train the model.Purpose: Model learns patterns from this dataSize: Typically 60-80% of total dataModule: A6-A8

U

When a model is too simple to capture the underlying patterns in the data.Signs:
  • Low training accuracy
  • Low test accuracy
  • Model is too simple
Solutions:
  • Increase model complexity
  • Add more features
  • Reduce regularization
  • Train longer
Module: A6-A8
Machine learning where the model finds patterns in unlabeled data.Tasks:
  • Clustering (grouping similar items)
  • Dimensionality reduction (simplifying data)
  • Anomaly detection (finding unusual patterns)
Examples: Customer segmentation, data compressionModule: A7

V

A portion of data used to tune hyperparameters and make model selection decisions.Purpose: Evaluate model during development without touching test setAlternative: Use cross-validation instead of a fixed validation setTypical Split: 15-20% of data (when using train-val-test split)Module: A6-A8
A measure of how much model predictions change with different training data.High Variance: Model is sensitive to training data (overfitting)Low Variance: Model predictions are stableBias-Variance Tradeoff: Balance between bias (underfitting) and variance (overfitting)Module: A6-A7
Performing operations on entire arrays rather than using loops.Benefits:
  • Much faster execution
  • More readable code
  • Utilizes optimized C/Fortran code under the hood
Example:
# Slow (loop)
result = []
for x in arr:
    result.append(x * 2)

# Fast (vectorized)
result = arr * 2
Module: A3 (NumPy)

Additional Terms

Accuracy

(TP + TN) / TotalProportion of correct predictions

Anomaly Detection

Identifying unusual patterns that don’t conform to expected behavior

API

Application Programming Interface - way programs communicate

Batch Normalization

Technique to normalize layer inputs in neural networks

Categorical Variable

Variable with discrete categories (e.g., color, country)

Confusion Matrix

Table showing model prediction vs actual values

Correlation

Statistical relationship between two variables (-1 to +1)

Data Leakage

When training data contains information about test data

Dropout

Regularization by randomly ignoring neurons during training

Encoding

Converting categorical data to numerical (one-hot, label encoding)

False Negative (FN)

Incorrectly predicted negative (Type II error)

False Positive (FP)

Incorrectly predicted positive (Type I error)

Gradient

Direction and rate of steepest increase of a function

Histogram

Bar chart showing distribution of numerical data

Imputation

Filling in missing values in dataset

Label

The target variable in supervised learning (what we predict)

MAE

Mean Absolute Error - average of absolute differences

Matrix

2D array of numbers

MSE

Mean Squared Error - average of squared differences

One-Hot Encoding

Converting categories to binary columns

Pipeline

Chain of data processing steps

Prediction

Model’s output for a given input

R² Score

Coefficient of determination (0 to 1, higher is better)

Sampling

Selecting subset of data from larger population

Scaling

Transforming features to similar ranges

Silhouette Score

Metric for evaluating clustering quality

Target Variable

What we’re trying to predict (dependent variable, label)

True Negative (TN)

Correctly predicted negative

True Positive (TP)

Correctly predicted positive

Weight

Parameter that scales input to neuron or feature

Quick Reference by Module

Array, Broadcasting, DataFrame, Series, Vectorization, Data Cleaning

Next Steps

Jupyter Notebooks

See these concepts in action across 111+ notebooks

Datasets

Practice with real datasets

Tools & Libraries

Learn the tools that implement these concepts

Module Overview

Understand how modules build on these concepts

Build docs developers (and LLMs) love