Neural Network Training

Welcome to neural network training! Last week you learned how to carry out inference in a neural network. This week, we’ll cover how to train a neural network on your own data.

Being able to take your own data and train your own neural network is a powerful and exciting capability. Let’s dive in!

Training Overview

Let’s continue with the handwritten digit recognition example - recognizing an image as 0 or 1.

Network Architecture

Input: Image pixels (X)
Layer 1: 25 units with sigmoid activation
Layer 2: 15 units with sigmoid activation
Output: 1 unit with sigmoid activation

Given a training set of images X with ground truth labels Y, how do you train the parameters of this neural network?

TensorFlow Training Code

Here’s the complete code to train a neural network in TensorFlow:

Step 1: Build the Model

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# Create the model architecture
model = Sequential([
    Dense(units=25, activation='sigmoid'),  # First hidden layer
    Dense(units=15, activation='sigmoid'),  # Second hidden layer
    Dense(units=1, activation='sigmoid')    # Output layer
])

This step is familiar from the inference section - you’re specifying the layers and their configurations.

Step 2: Compile the Model

# Specify the loss function
model.compile(
    loss='binary_crossentropy',
    optimizer='adam'
)

The key part of compilation is specifying the loss function. Binary crossentropy is the standard loss for binary classification problems.

Step 3: Train the Model

# Fit the model to the data
model.fit(X, Y, epochs=100)

This tells TensorFlow to fit the model using the specified loss function to your dataset X and Y.

Understanding Training: Comparison with Logistic Regression

To understand neural network training, let’s first recall logistic regression training from the previous course.

Logistic Regression Training Steps

Specify the Model

Define how to compute output given input:

# Logistic regression prediction
f(x) = g(w · x + b)

# Where g is the sigmoid function
g(z) = 1 / (1 + e^(-z))

Define Loss and Cost Functions

Loss function (single training example):

L(f(x), y) = -y * log(f(x)) - (1 - y) * log(1 - f(x))

Cost function (average over all examples):

J(w, b) = (1/m) * Σ L(f(x^(i)), y^(i))

Loss function: Measures error on a single training example
Cost function: Average loss over the entire training set

Train Using Gradient Descent

Update parameters to minimize cost:

# Gradient descent update
w = w - α * ∂J/∂w
b = b - α * ∂J/∂b

Neural Network Training Steps

Training a neural network follows the same three steps:

Step 1: Specify the Model

Define the neural network architecture:

model = Sequential([
    Dense(25, activation='sigmoid'),
    Dense(15, activation='sigmoid'),
    Dense(1, activation='sigmoid')
])

This specifies:

How many layers
How many neurons per layer
What activation functions to use
How to compute output given input and parameters

Step 2: Specify Loss and Cost

For binary classification, use binary crossentropy:

model.compile(loss='binary_crossentropy')

Binary Crossentropy Loss:

L(f(x), y) = -y * log(f(x)) - (1 - y) * log(1 - f(x))

This is the same loss function as logistic regression!

Why Binary Crossentropy?

Binary crossentropy (also called logistic loss) is ideal for binary classification because:

It heavily penalizes confident wrong predictions
It provides smooth gradients for optimization
It’s theoretically derived from maximum likelihood estimation

Cost Function:

J(W, B) = (1/m) * Σ L(f(x^(i)), y^(i))

Average the loss over all m training examples.

Step 3: Train Using Gradient Descent

model.fit(X, Y, epochs=100)

This executes gradient descent:

Compute gradients of cost with respect to all parameters
Update parameters: W = W - α * ∂J/∂W
Repeat for specified number of epochs

An epoch is one complete pass through the entire training dataset. Training for 100 epochs means the algorithm sees each example 100 times.

Backpropagation Algorithm

The key to neural network training is computing gradients efficiently using backpropagation.

Backpropagation computes the gradient of the loss function with respect to each parameter by applying the chain rule from calculus. TensorFlow handles this automatically!

What Backpropagation Does

Forward Pass

Compute outputs layer by layer from input to output

Compute Loss

Calculate the error between prediction and actual label

Backward Pass

Propagate the error backward through the network

Compute Gradients

Calculate ∂J/∂W and ∂J/∂B for each layer

Update Parameters

Adjust weights and biases using gradient descent

Different Loss Functions

Depending on your problem, you might use different loss functions:

Binary Classification
Regression
Multi-class Classification

# Binary crossentropy for 2 classes
model.compile(loss='binary_crossentropy')

Use when: Output is 0 or 1 (binary classification)

# Mean squared error for regression
model.compile(loss='mean_squared_error')

Use when: Predicting continuous values

# Categorical crossentropy for multiple classes
model.compile(loss='categorical_crossentropy')

Use when: Output is one of many classes

Complete Training Example

Here’s a complete example for the digit classification problem:

import numpy as np
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# Prepare training data
X = np.array([...])  # Image pixel values
Y = np.array([...])  # Labels (0 or 1)

# Build model
model = Sequential([
    Dense(25, activation='sigmoid', input_shape=(784,)),
    Dense(15, activation='sigmoid'),
    Dense(1, activation='sigmoid')
])

# Compile model
model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

# Train model
history = model.fit(
    X, Y,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

# Evaluate model
test_loss, test_accuracy = model.evaluate(X_test, Y_test)
print(f"Test accuracy: {test_accuracy}")

# Make predictions
predictions = model.predict(X_new)

Training Parameters Explained

Epochs

Number of complete passes through the training data

Batch Size

Number of examples processed before updating parameters

Learning Rate

Step size for parameter updates (controlled by optimizer)

Validation Split

Fraction of data reserved for validation during training

Monitoring Training Progress

TensorFlow provides tools to monitor training:

# Training with validation
history = model.fit(
    X_train, Y_train,
    epochs=100,
    validation_data=(X_val, Y_val),
    verbose=1
)

# Plot training history
import matplotlib.pyplot as plt

plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

Monitoring validation loss helps detect overfitting. If training loss decreases but validation loss increases, your model is overfitting.

Common Training Issues

Model Not Learning

Symptoms: Loss stays constant or decreases very slowlySolutions:

Increase learning rate
Check data preprocessing
Verify labels are correct
Try different initialization

Overfitting

Symptoms: Training accuracy high, validation accuracy lowSolutions:

Add more training data
Use dropout or regularization
Reduce model complexity
Use early stopping

Underfitting

Symptoms: Both training and validation accuracy are lowSolutions:

Increase model complexity
Train for more epochs
Reduce regularization
Check for bugs in data pipeline

Optimizers in TensorFlow

TensorFlow offers several optimization algorithms:

# Adam optimizer (most common)
model.compile(
    loss='binary_crossentropy',
    optimizer='adam'
)

# SGD with momentum
model.compile(
    loss='binary_crossentropy',
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9)
)

# RMSprop
model.compile(
    loss='binary_crossentropy',
    optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.001)
)

Adam is usually the best starting point. It combines the benefits of momentum and adaptive learning rates.

Best Practices

Normalize Your Data

Scale input features to similar ranges (e.g., 0-1 or standardize)

Use Appropriate Batch Sizes

Typical values: 32, 64, 128, or 256

Monitor Validation Metrics

Always evaluate on a held-out validation set

Save Checkpoints

Save model weights periodically during training

Use Early Stopping

Stop training when validation performance stops improving

Early Stopping Example

from tensorflow.keras.callbacks import EarlyStopping

# Define early stopping
early_stop = EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True
)

# Train with early stopping
model.fit(
    X_train, Y_train,
    epochs=1000,
    validation_data=(X_val, Y_val),
    callbacks=[early_stop]
)

Early stopping automatically stops training when the model stops improving, preventing overfitting and saving time.

Summary

Neural network training involves three key steps:

Specify the model: Define architecture with layers and activations
Choose loss function: Binary crossentropy for classification, MSE for regression
Train with gradient descent: Use backpropagation to compute gradients

TensorFlow automates the complex mathematics of backpropagation, letting you focus on model architecture and hyperparameters.

Next Steps

Now that you understand neural network training:

Experiment with different architectures
Try various loss functions and optimizers
Practice with real datasets
Learn about regularization techniques
Explore advanced architectures (CNNs, RNNs)

The ability to debug and improve your models comes from understanding what’s happening under the hood. Even when using high-level APIs, this knowledge is invaluable.

Congratulations! You now have the complete foundation for building, training, and deploying neural networks!

Get Started

Supervised Learning

Unsupervised Learning

Advanced Learning Algorithms

Neural Network Training

Training Overview

Network Architecture

TensorFlow Training Code

Understanding Training: Comparison with Logistic Regression

Logistic Regression Training Steps

Neural Network Training Steps

Step 1: Specify the Model

Step 2: Specify Loss and Cost

Step 3: Train Using Gradient Descent

Backpropagation Algorithm

What Backpropagation Does

Different Loss Functions

Complete Training Example

Training Parameters Explained

Epochs

Batch Size

Learning Rate

Validation Split

Monitoring Training Progress

Common Training Issues

Optimizers in TensorFlow

Best Practices

Early Stopping Example

Summary

Next Steps

Build docs developers (and LLMs) love

Get Started

Supervised Learning

Unsupervised Learning

Advanced Learning Algorithms

​Training Overview

​Network Architecture

​TensorFlow Training Code

​Understanding Training: Comparison with Logistic Regression

​Logistic Regression Training Steps

​Neural Network Training Steps

​Step 1: Specify the Model

​Step 2: Specify Loss and Cost

​Step 3: Train Using Gradient Descent

​Backpropagation Algorithm

​What Backpropagation Does

​Different Loss Functions

​Complete Training Example

​Training Parameters Explained

Epochs

Batch Size

Learning Rate

Validation Split

​Monitoring Training Progress

​Common Training Issues

​Optimizers in TensorFlow

​Best Practices

​Early Stopping Example

​Summary

​Next Steps

Build docs developers (and LLMs) love

Training Overview

Network Architecture

TensorFlow Training Code

Understanding Training: Comparison with Logistic Regression

Logistic Regression Training Steps

Neural Network Training Steps

Step 1: Specify the Model

Step 2: Specify Loss and Cost

Step 3: Train Using Gradient Descent

Backpropagation Algorithm

What Backpropagation Does

Different Loss Functions

Complete Training Example

Training Parameters Explained

Monitoring Training Progress

Common Training Issues

Optimizers in TensorFlow

Best Practices

Early Stopping Example

Summary

Next Steps