Overview
This project builds intelligent image classifiers for the Fashion-MNIST dataset using deep learning. The project implements and compares different neural network architectures in both Keras/TensorFlow and PyTorch, progressing from simple dense networks to convolutional neural networks (CNNs). Objective: Classify images of clothing items into 10 categories using deep learning, demonstrating the power of neural networks for computer vision tasks. Dataset: Fashion-MNIST- 60,000 training images
- 10,000 test images
- 28x28 grayscale images
- 10 classes of clothing items
Project Structure
Fashion-MNIST Dataset
Classes
The dataset contains 10 categories of fashion items:| Label | Class | Description |
|---|---|---|
| 0 | T-shirt/top | T-shirts and tops |
| 1 | Trouser | Pants and trousers |
| 2 | Pullover | Sweaters and pullovers |
| 3 | Dress | Dresses |
| 4 | Coat | Coats and jackets |
| 5 | Sandal | Sandals |
| 6 | Shirt | Shirts |
| 7 | Sneaker | Sneakers and athletic shoes |
| 8 | Bag | Bags and purses |
| 9 | Ankle boot | Ankle boots |
Load and Explore Data
Keras:Data Preprocessing
Keras:Model 1: Dense Neural Network
Architecture
Simple fully-connected network:Keras Implementation
PyTorch Implementation
Model 2: Convolutional Neural Network (CNN)
Architecture
CNN with convolutional and pooling layers:Keras Implementation
PyTorch Implementation
Model Comparison
Performance Metrics
| Model | Parameters | Test Accuracy | Training Time |
|---|---|---|---|
| Dense (Keras) | 109,386 | 0.8782 | ~2 min |
| Dense (PyTorch) | 109,386 | 0.8795 | ~2 min |
| CNN (Keras) | 225,034 | 0.9145 | ~4 min |
| CNN (PyTorch) | 225,034 | 0.9158 | ~4 min |
- CNN outperforms dense networks by ~3.6 percentage points
- Keras and PyTorch implementations achieve similar results
- CNNs are more parameter-efficient for image tasks (better accuracy with only 2x parameters)
- Spatial features captured by convolutions are crucial for image classification
Training Visualization
Training History (Keras)
Confusion Matrix
- Shirts are most frequently confused (77% precision)
- Trousers, Bags, and Sandals are easiest to classify (>96% accuracy)
- Main confusion: Shirts vs T-shirts/Pullovers (similar appearance)
Prediction Examples
Model Saving and Loading
Keras
PyTorch
Key Concepts Demonstrated
1. Deep Learning Fundamentals
- Forward propagation
- Backpropagation and gradient descent
- Activation functions (ReLU, Softmax)
- Loss functions (CrossEntropy)
2. Network Architectures
- Dense (Fully-connected): Simple but less efficient for images
- Convolutional: Exploits spatial structure, better for vision
3. Regularization Techniques
- Dropout: Prevents overfitting by randomly dropping neurons
- Data augmentation: Could be added for further improvement
4. Training Best Practices
- Train/validation split for monitoring
- Early stopping to prevent overfitting
- Batch processing for efficiency
- Learning rate tuning
5. Evaluation Metrics
- Accuracy: Overall correctness
- Precision/Recall: Class-specific performance
- Confusion matrix: Error patterns
- F1-score: Balanced metric
Why CNNs Win for Images
Dense Network Limitations
- No spatial awareness: Treats pixels as independent features
- Too many parameters: 784 input neurons for 28x28 images
- Translation invariance: Can’t recognize shifted patterns
CNN Advantages
- Local connectivity: Learns spatial patterns
- Weight sharing: Same filter scans entire image (parameter efficiency)
- Hierarchical features: Low-level edges → high-level objects
- Translation invariance: Recognizes patterns anywhere in image
- Dense: 784 × 128 = 100,480 parameters in first layer
- CNN: 3×3×32 = 320 parameters in first layer (but scans entire image)
Installation and Usage
Prerequisites
Run Keras Notebook
Run PyTorch Notebook
GPU Acceleration (Optional)
Check GPU availability:Limitations and Future Work
Current Limitations
- Simple architectures: Modern CNNs (ResNet, EfficientNet) are much deeper
- No data augmentation: Could improve generalization
- No hyperparameter tuning: Learning rate, batch size not optimized
- Single dataset: Focused on Fashion-MNIST only
Future Improvements
-
Advanced Architectures
- Add batch normalization
- Implement residual connections
- Try VGG, ResNet, or MobileNet architectures
-
Data Augmentation
-
Transfer Learning
- Use pre-trained models from ImageNet
- Fine-tune on Fashion-MNIST
-
Hyperparameter Optimization
- Grid search or Bayesian optimization
- Learning rate scheduling
- Adaptive optimizers (AdamW, RAdam)
-
Ensemble Methods
- Combine multiple models
- Test-time augmentation
-
Deployment
- Convert to TensorFlow Lite for mobile
- Deploy as REST API with Flask/FastAPI
- Create web interface with Streamlit
Keras vs PyTorch Comparison
Keras/TensorFlow
Pros:- Simpler, more beginner-friendly API
- Better for rapid prototyping
- Excellent documentation
- Built-in utilities (callbacks, metrics)
- Easy deployment with TF Serving
- Less flexible for custom operations
- Harder to debug complex models
PyTorch
Pros:- More Pythonic, intuitive design
- Dynamic computational graphs
- Better for research and experimentation
- Easier debugging
- Growing ecosystem
- More verbose code
- Slightly steeper learning curve
- Manual training loop required
- Beginners: Start with Keras
- Researchers: Use PyTorch
- Production: Both are production-ready
Conclusion
This deep learning project demonstrates:- Image classification with neural networks
- Architecture comparison: Dense vs CNN
- Framework comparison: Keras vs PyTorch
- Best practices: Training, evaluation, visualization
- Real-world application: Fashion item recognition
- CNNs are superior for image tasks (91.5% vs 87.8% accuracy)
- Both frameworks (Keras and PyTorch) achieve similar results
- Fashion-MNIST provides a realistic benchmark (harder than MNIST digits)
- Deep learning enables automated visual recognition at scale