Architecture Overview
The model uses a 3-layer convolutional neural network (CNN) with approximately 13 million trainable parameters.This architecture is specifically designed to balance model capacity with the limited dataset size (~6K images) while avoiding overfitting.
Layer-by-Layer Breakdown
Input Layer
Input images are RGB format with dimensions 224×224 pixels. The 3 channels represent RGB values, even though X-rays are grayscale.
Convolutional Block 1
Filters: 32
Kernel size: 3×3
Activation: ReLU
Output shape: (222, 222, 32)
Kernel size: 3×3
Activation: ReLU
Output shape: (222, 222, 32)
- Edges and lines
- Basic textures
- Intensity gradients
Pool size: 2×2
Output shape: (111, 111, 32)
Output shape: (111, 111, 32)
- Reduces spatial dimensions by 50%
- Provides translation invariance (small position shifts don’t affect output)
- Reduces computational load for subsequent layers
- Helps prevent overfitting by abstracting features
Convolutional Block 2
Filters: 64
Kernel size: 3×3
Activation: ReLU
Output shape: (109, 109, 64)
Kernel size: 3×3
Activation: ReLU
Output shape: (109, 109, 64)
- Texture combinations
- Opacity patterns
- Regional abnormalities
Pool size: 2×2
Output shape: (54, 54, 64)
Output shape: (54, 54, 64)
Convolutional Block 3
Filters: 128
Kernel size: 3×3
Activation: ReLU
Output shape: (52, 52, 128)
Kernel size: 3×3
Activation: ReLU
Output shape: (52, 52, 128)
- Consolidations (pneumonia indicators)
- Infiltrates
- Specific disease patterns
Pool size: 2×2
Output shape: (26, 26, 128)
Output shape: (26, 26, 128)
Fully Connected Layers
Converts 3D feature maps (26×26×128) into 1D vector of 86,528 elements.
Units: 128
Activation: ReLU
Parameters: ~11.1M
Activation: ReLU
Parameters: ~11.1M
Rate: 0.5 (50%)
Purpose: Prevents overfitting by randomly deactivating neurons during training
Purpose: Prevents overfitting by randomly deactivating neurons during training
Units: 2
Activation: Softmax
Classes: [NORMAL, PNEUMONIA]
Activation: Softmax
Classes: [NORMAL, PNEUMONIA]
Activation Functions
ReLU (Hidden Layers)
Formula:- Advantages
- How It Works
- No vanishing gradient: Gradients don’t diminish during backpropagation
- Computational efficiency: Simple max operation
- Sparsity: Outputs exactly 0 for negative inputs, creating sparse representations
- Industry standard: Proven performance in modern CNNs
Softmax (Output Layer)
Formula:- Advantages
- Output Example
- Probabilistic interpretation: Outputs sum to 1.0
- Class probabilities: Each output represents P(class | image)
- Differentiable: Enables gradient-based optimization
- Standard for classification: Ideal for multi-class problems
Parameter Count
Total trainable parameters: ~13,000,000The majority of parameters (>98%) are in the first fully connected layer, which combines spatial features for classification.
Why 3 Convolutional Layers?
- Design Rationale
- Feature Hierarchy
Too Shallow (1-2 layers):
- Insufficient capacity to capture complex pneumonia patterns
- Lower accuracy on validation data
- Cannot learn hierarchical features
- Captures low, mid, and high-level features
- Sufficient for dataset size (~6K images)
- Balances accuracy and training time
- Prevents overfitting with limited data
- Risk of overfitting with small dataset
- Longer training time
- Diminishing returns on accuracy
Comparison to Alternatives
Why Not Transfer Learning?
- VGG16
- ResNet50
- Our Custom CNN
Advantages:
- Pre-trained on ImageNet (1.4M images)
- Better feature extraction
- Higher potential accuracy
- Much larger (~138M parameters)
- Harder to explain and interpret
- Overkill for educational project
- Longer inference time
Why Not Simpler Models?
Multi-Layer Perceptron (MLP):- Would require flattening 224×224×3 = 150,528 input features
- Loses spatial structure of images
- Prohibitively large number of parameters
- Cannot detect translation-invariant features
- Designed for sequential data (text, time series)
- Don’t leverage 2D spatial structure
- Much slower to train
- Unnecessarily complex for image classification
CNNs are the industry standard for medical image analysis because they preserve spatial relationships and learn hierarchical features automatically.
Architecture Diagram
Technical References
- He et al. (2016): Deep Residual Learning for Image Recognition
- Simonyan & Zisserman (2014): Very Deep Convolutional Networks (VGG)
- Rajpurkar et al. (2017): CheXNet - Radiologist-Level Pneumonia Detection on Chest X-Rays