UC Intel Final - Malware Classification Platform
An advanced ensemble machine learning platform for classifying malware using deep learning techniques with PyTorch. This project provides a professional, multi-page Streamlit dashboard for building, training, and evaluating malware classification models.What is Malware Image Classification?
Malware binaries can be visualized as grayscale images, where each byte is mapped to a pixel intensity value. This visual representation allows deep learning models to detect patterns and classify malware families based on their structural characteristics.Platform Overview
The UC Intel Final platform provides a complete end-to-end solution for:Dataset Management
Configure train/validation/test splits, apply preprocessing, and set up data augmentation pipelines
Model Builder
Design custom CNNs, use transfer learning with pre-trained models, or build transformer architectures
Training Pipeline
Train with customizable hyperparameters, live monitoring, and automatic checkpointing
Model Interpretability
Visualize model decisions with Grad-CAM, analyze misclassifications, and explore embeddings
Key Features
Professional Streamlit Dashboard
- Multi-page architecture with self-contained modules
- Theme customization with color presets and CSS injection
- Session management for saving and resuming work
- Real-time training monitoring with live metrics updates
Flexible Model Architectures
Comprehensive Training Engine
The training pipeline includes:- Multiple optimizers: Adam, AdamW, SGD with Momentum, RMSprop
- Learning rate schedulers: ReduceLROnPlateau, Cosine Annealing, Step Decay, Exponential
- Class imbalance handling: Auto class weights, Focal Loss
- Early stopping with configurable patience
- Automatic checkpointing for best models
- Real-time metrics: Loss, accuracy, precision, recall, F1-score
Advanced Data Augmentation
Augmentation Presets
Augmentation Presets
The platform provides three built-in augmentation presets:Light Augmentation
- Rotation: ±10°
- Horizontal flip: 50%
- Brightness: ±10%
- Rotation: ±20°
- Horizontal flip: 50%
- Vertical flip: 30%
- Brightness: ±20%
- Contrast: ±20%
- Rotation: ±30°
- Horizontal & vertical flip: 50%
- Brightness: ±30%
- Contrast: ±30%
- Gaussian noise: 5%
Who is This For?
Researchers & Students
Ideal for academic projects and experiments in:- Deep learning for cybersecurity
- Malware analysis and classification
- Computer vision applications
- Model interpretability research
ML Engineers
Provides a production-ready framework for:- Rapid prototyping of CNN architectures
- Transfer learning experimentation
- Hyperparameter tuning and optimization
- Model performance benchmarking
Security Analysts
Enables security teams to:- Build custom malware classifiers
- Analyze model predictions with Grad-CAM
- Identify misclassification patterns
- Evaluate model robustness
Architecture Principles
Self-Contained Pages
Each page in the
content/ directory is fully self-contained with its own folder structureState Management
All session state access goes through abstraction layers in
state/ module (no direct st.session_state access)Tab-Based Organization
Complex pages split content into multiple tab files for better code organization
Project Structure
Technology Stack
PyTorch
Deep learning framework for building and training neural networks
Streamlit
Interactive dashboard for the complete ML workflow
torchvision
Pre-trained models and image transformations
scikit-learn
Metrics calculation and evaluation tools
Plotly
Interactive visualizations and charts
UMAP
Dimensionality reduction for embedding visualization
Next Steps
Quick Start
Get up and running in 5 minutes
Installation
Detailed installation instructions
This platform was developed as part of the Sistemas Inteligentes II course at Universidad de Caldas, taught by Professor Jorge Alberto Jaramillo Garzón.