Being able to take your own data and train your own neural network is a powerful and exciting capability. Let’s dive in!
Training Overview
Let’s continue with the handwritten digit recognition example - recognizing an image as 0 or 1.Network Architecture
- Input: Image pixels (X)
- Layer 1: 25 units with sigmoid activation
- Layer 2: 15 units with sigmoid activation
- Output: 1 unit with sigmoid activation
TensorFlow Training Code
Here’s the complete code to train a neural network in TensorFlow:import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
# Create the model architecture
model = Sequential([
Dense(units=25, activation='sigmoid'), # First hidden layer
Dense(units=15, activation='sigmoid'), # Second hidden layer
Dense(units=1, activation='sigmoid') # Output layer
])
This step is familiar from the inference section - you’re specifying the layers and their configurations.
The key part of compilation is specifying the loss function. Binary crossentropy is the standard loss for binary classification problems.
Understanding Training: Comparison with Logistic Regression
To understand neural network training, let’s first recall logistic regression training from the previous course.Logistic Regression Training Steps
# Logistic regression prediction
f(x) = g(w · x + b)
# Where g is the sigmoid function
g(z) = 1 / (1 + e^(-z))
- Loss function: Measures error on a single training example
- Cost function: Average loss over the entire training set
Neural Network Training Steps
Training a neural network follows the same three steps:Step 1: Specify the Model
Define the neural network architecture:- How many layers
- How many neurons per layer
- What activation functions to use
- How to compute output given input and parameters
Step 2: Specify Loss and Cost
For binary classification, use binary crossentropy:Why Binary Crossentropy?
Why Binary Crossentropy?
Binary crossentropy (also called logistic loss) is ideal for binary classification because:
- It heavily penalizes confident wrong predictions
- It provides smooth gradients for optimization
- It’s theoretically derived from maximum likelihood estimation
Step 3: Train Using Gradient Descent
- Compute gradients of cost with respect to all parameters
- Update parameters:
W = W - α * ∂J/∂W - Repeat for specified number of epochs
Backpropagation Algorithm
The key to neural network training is computing gradients efficiently using backpropagation.Backpropagation computes the gradient of the loss function with respect to each parameter by applying the chain rule from calculus. TensorFlow handles this automatically!
What Backpropagation Does
Different Loss Functions
Depending on your problem, you might use different loss functions:- Binary Classification
- Regression
- Multi-class Classification
Complete Training Example
Here’s a complete example for the digit classification problem:Training Parameters Explained
Epochs
Number of complete passes through the training data
Batch Size
Number of examples processed before updating parameters
Learning Rate
Step size for parameter updates (controlled by optimizer)
Validation Split
Fraction of data reserved for validation during training
Monitoring Training Progress
TensorFlow provides tools to monitor training:Common Training Issues
Model Not Learning
Model Not Learning
Symptoms: Loss stays constant or decreases very slowlySolutions:
- Increase learning rate
- Check data preprocessing
- Verify labels are correct
- Try different initialization
Overfitting
Overfitting
Symptoms: Training accuracy high, validation accuracy lowSolutions:
- Add more training data
- Use dropout or regularization
- Reduce model complexity
- Use early stopping
Underfitting
Underfitting
Symptoms: Both training and validation accuracy are lowSolutions:
- Increase model complexity
- Train for more epochs
- Reduce regularization
- Check for bugs in data pipeline
Optimizers in TensorFlow
TensorFlow offers several optimization algorithms:Adam is usually the best starting point. It combines the benefits of momentum and adaptive learning rates.
Best Practices
Early Stopping Example
Summary
Neural network training involves three key steps:- Specify the model: Define architecture with layers and activations
- Choose loss function: Binary crossentropy for classification, MSE for regression
- Train with gradient descent: Use backpropagation to compute gradients
Next Steps
Now that you understand neural network training:- Experiment with different architectures
- Try various loss functions and optimizers
- Practice with real datasets
- Learn about regularization techniques
- Explore advanced architectures (CNNs, RNNs)
The ability to debug and improve your models comes from understanding what’s happening under the hood. Even when using high-level APIs, this knowledge is invaluable.
