Overview
Deep learning uses artificial neural networks with multiple layers to learn hierarchical representations of data. This module covers the fundamentals from Module A8 of the bootcamp.You’ll learn neural network basics and build image classifiers using both Keras/TensorFlow and PyTorch frameworks.
What are Neural Networks?
Neural networks are computing systems inspired by biological neural networks. They consist of:Input Layer
Receives raw data (e.g., pixel values from images)
Hidden Layers
Transform inputs through learned weights and activations
Output Layer
Produces predictions (e.g., class probabilities)
The Artificial Neuron
Each neuron performs two operations:-
Linear transformation: Weighted sum of inputs plus bias
-
Non-linear activation: Applies activation function
Activation Functions
Activation functions introduce non-linearity, enabling networks to learn complex patterns.ReLU (Rectified Linear Unit)
Most popular activation for hidden layers✓ Fast to compute
✓ Mitigates vanishing gradient
✓ Sparse activation
✗ Can “die” (always output 0)
✓ Mitigates vanishing gradient
✓ Sparse activation
✗ Can “die” (always output 0)
Sigmoid
Used for binary classification output✓ Outputs between 0 and 1 (probability)
✓ Smooth gradient
✗ Vanishing gradient for extreme values
✗ Not zero-centered
✓ Smooth gradient
✗ Vanishing gradient for extreme values
✗ Not zero-centered
Softmax
Used for multi-class classification output✓ Outputs sum to 1 (probability distribution)
✓ Differentiable
✓ Handles multiple classes
✓ Differentiable
✓ Handles multiple classes
Network Architectures
Dense (Fully Connected) Networks
Every neuron in layer L connects to every neuron in layer L+1. Advantages:- Simple to understand and implement
- Universal function approximators
- Work well for structured/tabular data
- Many parameters (memory and computation)
- Don’t exploit spatial structure in images
- Prone to overfitting
Convolutional Neural Networks (CNNs)
CNNs use convolutional layers that:- Apply small filters across the image
- Detect local patterns (edges, textures, shapes)
- Share parameters (fewer weights)
- Build hierarchical representations
Building Blocks of Deep Networks
Layers
Dense/Linear Layer
Dense/Linear Layer
Fully connected layer: each neuron connects to all neurons in previous layer.
Convolutional Layer
Convolutional Layer
Applies filters to extract spatial features from images.
Pooling Layer
Pooling Layer
Downsamples feature maps, reducing dimensionality and computation.
Dropout Layer
Dropout Layer
Randomly drops neurons during training to prevent overfitting.
Loss Functions
Binary Classification:Optimizers
Adam (Adaptive Moment Estimation) is the most popular:- Combines momentum and adaptive learning rates
- Works well with default parameters
- Fast convergence
Deep Learning Frameworks
Keras / TensorFlow
Pros
• High-level, beginner-friendly API
• Fast prototyping
• Excellent documentation
• TensorFlow production ecosystem
• Fast prototyping
• Excellent documentation
• TensorFlow production ecosystem
Cons
• Less flexible for custom operations
• Debugging can be challenging
• Abstraction hides details
• Debugging can be challenging
• Abstraction hides details
PyTorch
Pros
• Pythonic, intuitive API
• Flexible and dynamic
• Excellent for research
• Easy debugging
• Flexible and dynamic
• Excellent for research
• Easy debugging
Cons
• More boilerplate code
• Steeper learning curve
• Manual training loop
• Steeper learning curve
• Manual training loop
Training Deep Networks
Forward Propagation
- Pass input through network layers
- Compute predictions
- Calculate loss between predictions and true labels
Backpropagation
- Compute gradient of loss with respect to each weight
- Use chain rule to propagate gradients backward
- Update weights using optimizer
Training Tips
Monitor Overfitting
Use validation set. Stop if validation loss increases while training loss decreases.
Use Dropout
Add Dropout layers (0.3-0.5) to prevent overfitting, especially in dense layers.
Batch Normalization
Normalizes layer inputs, speeds up training and improves stability.
Data Augmentation
For images: random flips, rotations, crops. Increases dataset size and regularizes.
Comparing Performance
From the Fashion-MNIST project (Module A8):| Model Type | Test Accuracy (Keras) | Test Accuracy (PyTorch) |
|---|---|---|
| Dense Network | ~88% | ~88% |
| CNN | ~90%+ | ~90%+ |
Key Finding: CNNs outperform dense networks on image data, achieving ~2-3% higher accuracy with fewer parameters.
Dataset: Fashion-MNIST
The bootcamp projects use Fashion-MNIST, a dataset of 70,000 grayscale images (28×28 pixels) across 10 clothing categories:- T-shirt/top (Camiseta/top)
- Trouser (Pantalón)
- Pullover (Suéter)
- Dress (Vestido)
- Coat (Abrigo)
- Sandal (Sandalia)
- Shirt (Camisa)
- Sneaker (Zapatilla)
- Bag (Bolso)
- Ankle boot (Botín)
Practical Considerations
Choosing a Framework
Use Keras if:- You’re a beginner
- You need fast prototyping
- You want simple, readable code
- You’re deploying with TensorFlow Serving
- You need flexibility for custom architectures
- You’re doing research
- You prefer Pythonic code
- You want easier debugging
Hardware Acceleration
Deep learning benefits significantly from GPUs:Next Steps
Build Neural Networks
Implement complete models with Keras and PyTorch using FashionMNIST
Clustering
Review clustering techniques for unsupervised learning