Neural Network Type
Convolutional Neural Network (CNN) We chose a custom CNN architecture with three convolutional blocks designed specifically for medical image classification.Architecture Justification
Why CNN vs Other Neural Network Types?
CNN vs Multi-Layer Perceptron (MLP)
CNN vs Multi-Layer Perceptron (MLP)
Problems with MLP:
- Would require flattening the image (224×224×3 = 150,528 input parameters)
- Loss of spatial information
- Prohibitive number of parameters
- Cannot capture local relationships between pixels
- Maintains 2D spatial structure
- Shared parameters in filters
- Detects local features (edges, textures)
- Translation invariance
CNN vs Recurrent Neural Networks (RNN/LSTM)
CNN vs Recurrent Neural Networks (RNN/LSTM)
Problems with RNN/LSTM:
- Designed for sequential data (text, time series)
- Don’t leverage spatial structure of images
- Slower to train
- Unnecessarily complex for this problem
- Optimized for data with spatial structure
- Efficient parallel operations
- Proven architecture for medical images
CNN Architecture
High-Level Design
Detailed Architecture Diagram
Layer-by-Layer Breakdown
Input Layer
- Dimension: 224 × 224 × 3 (RGB)
- Preprocessing: Normalization [0, 1], resizing
Convolutional Block 1
Conv2D Layer- 32 filters, 3×3 kernel, ReLU activation
- Why 32 filters? Balance between capacity and efficiency for initial feature detection
- Why 3×3 kernel? Captures basic local patterns (edges, lines)
- 2×2 pool size
- Reduces dimensions to 112×112
- Provides invariance to small translations
- Reduces computational load
Convolutional Block 2
Conv2D Layer- 64 filters, 3×3 kernel, ReLU activation
- Deeper filters capture more complex patterns
- Detects textures and opacity patterns
- 2×2 pool size
- Reduces dimensions to 56×56
Convolutional Block 3
Conv2D Layer- 128 filters, 3×3 kernel, ReLU activation
- High-level feature extraction
- Pneumonia-specific patterns (consolidations, infiltrates)
- 2×2 pool size
- Final dimension: 28×28
Fully Connected Layers
Flatten- Converts 3D tensor to 1D vector
- ReLU activation
- Combines features for decision making
- Regularization: Prevents overfitting
- Randomly deactivates 50% of neurons during training
- Forces network to learn robust features
- Softmax activation
- 2 classes: [NORMAL, PNEUMONIA]
- Outputs probabilities that sum to 1
Activation Functions
ReLU (Rectified Linear Unit)
Used in all hidden layers:Advantages of ReLU:
- No vanishing gradient problem
- Computationally efficient
- Introduces non-linearity
- Standard in modern CNNs
Softmax
Used in output layer:Advantages of Softmax:
- Converts logits to probabilities
- Clear probabilistic interpretation
- Ideal for multi-class classification
Hyperparameters
Training Configuration
Loss Function: Categorical Crossentropy- Penalizes incorrect predictions
- Standard for classification tasks
- Adaptive learning rate
- Combines advantages of RMSprop and Momentum
- Fast and stable convergence
- Initial learning rate: 0.001
Adam is chosen over SGD for its adaptive learning rate, which helps navigate the complex loss landscape of deep networks more efficiently.
- Accuracy: Percentage of correct predictions
- Precision: Of predicted pneumonia cases, how many are correct
- Recall (Sensitivity): Of actual pneumonia cases, how many we detect
- F1-Score: Harmonic mean of precision and recall
Regularization Strategies
1. Dropout (50%)
1. Dropout (50%)
- Prevents co-adaptation of neurons
- Reduces overfitting
- Simulates ensemble of networks
- Applied only during training
2. Data Augmentation
2. Data Augmentation
Applied during training:
- Rotations: ±15 degrees
- Translations: ±10%
- Zoom: ±10%
- Horizontal flip: Yes
3. Early Stopping
3. Early Stopping
- Monitors validation loss
- Stops training if no improvement for 5 epochs
- Prevents overfitting to training data
- Restores best weights
Model Parameters
Parameter Count Estimation
Model Size Justification
Why This Architecture?
Not Too Deep
- Avoids overfitting with limited dataset (~6K images)
- Faster training time
- Sufficient capacity to learn complex patterns
- Can capture hierarchical features
- Between accuracy and training time
- Can be trained on CPU in reasonable time
- Production-ready for deployment
Alternative Approaches Considered
Transfer Learning (VGG16, ResNet50)
Transfer Learning (VGG16, ResNet50)
Advantages:
- Better potential accuracy
- Less training data needed
- Pre-trained on ImageNet
- Greater complexity
- Harder to explain
- Larger model size
Simpler Model (1-2 Conv Layers)
Simpler Model (1-2 Conv Layers)
Advantages:
- Faster training
- Fewer parameters
- Lower computational requirements
- Insufficient capacity
- Lower performance
- Cannot capture complex patterns
Progressive Feature Learning
The architecture follows a pattern of progressive feature extraction: Layer 1 (32 filters): Basic features- Edges and lines
- Simple gradients
- Basic texture elements
- Texture patterns
- Opacity variations
- Shape components
- Consolidation patterns
- Infiltrate signatures
- Disease-specific markers
Technical References
- He et al. (2016) - Deep Residual Learning
- Simonyan & Zisserman (2014) - VGG Networks
- Rajpurkar et al. (2017) - CheXNet: Radiologist-Level Pneumonia Detection