Skip to main content

Overview

The UC Intel Final platform is a comprehensive malware classification system built with a modern, modular architecture that separates concerns between the web interface (Streamlit), machine learning models (PyTorch), and training pipeline.

Streamlit UI

Interactive multi-page dashboard for experiment configuration and monitoring

PyTorch Models

Custom CNN, Transfer Learning, and Vision Transformer architectures

Training Pipeline

Background training engine with real-time monitoring and checkpointing

State Management

File-based persistence with session state abstraction

High-Level Architecture

Application Architecture

Directory Structure

The platform follows a self-contained architecture where each module is isolated and communicates through well-defined interfaces:
app/
├── main.py                      # Entry point + navigation setup

├── content/                     # Self-contained page modules
│   ├── home/                   # Session management
│   ├── dataset/                # Dataset configuration with tabs
│   ├── model/                  # Model architecture builder
│   ├── training/               # Training configuration
│   ├── monitor/                # Live training monitoring
│   ├── results/                # Results & evaluation
│   └── interpret/              # Model interpretability

├── components/                  # Shared UI components (flat structure)
│   ├── header.py               # App header with session info
│   ├── sidebar.py              # Configuration status sidebar
│   ├── theme.py                # Theme customization
│   ├── styling.py              # CSS injection
│   └── utils.py                # GPU detection, utilities

├── state/                       # Session state management
│   ├── workflow.py             # ML workflow state
│   ├── ui.py                   # UI preferences
│   ├── cache.py                # Cached operations
│   ├── session_state.py        # Session utilities
│   └── persistence.py          # File-based persistence

├── models/                      # PyTorch model architectures
│   ├── base.py                 # Abstract base class
│   ├── pytorch/
│   │   ├── cnn_builder.py     # Custom CNN builder
│   │   ├── transfer.py        # Transfer learning models
│   │   └── transformer.py     # Vision Transformer
│   └── manual/                 # Manual implementations

├── training/                    # Training infrastructure
│   ├── engine.py               # Core training loop
│   ├── worker.py               # Background training
│   ├── dataset.py              # PyTorch Dataset & DataLoader
│   ├── transforms.py           # Image transformations
│   ├── optimizers.py           # Optimizer & scheduler factory
│   └── evaluator.py            # Model evaluation

└── utils/                       # Utility functions
    ├── dataset_utils.py        # Dataset scanning
    ├── dataset_viz.py          # Visualizations
    └── checkpoint_manager.py   # Model checkpointing
The architecture uses NO __init__.py files - all imports use absolute paths from the project root (e.g., from models.pytorch.cnn_builder import CustomCNNBuilder).

Architecture Principles

1. Self-Contained Page Modules

Each page in content/ is fully self-contained with:
  • page.py - Entry point that renders header/sidebar and calls view
  • view.py - Main view logic and coordinator
  • tabs/ - Optional subfolder for complex multi-tab pages
Example: Dataset Page Structure
# content/dataset/page.py
from components.header import render_header
from components.sidebar import render_sidebar
from content.dataset import view

render_header()
render_sidebar()
view.render()

2. State Management Abstraction

Critical Design Pattern: All session state access goes through state/ module functions. NEVER access st.session_state directly from page code.
State is divided into three domains:
  • workflow.py - ML workflow state (configs, training status, results)
  • ui.py - UI preferences (theme, past sessions)
  • cache.py - Cached operations (dataset scans, expensive computations)
Example: Proper State Access
# ✅ CORRECT
from state.workflow import get_dataset_config, save_dataset_config
config = get_dataset_config()
save_dataset_config(new_config)

# ❌ WRONG
config = st.session_state.dataset_config
st.session_state.dataset_config = new_config
See app/state/workflow.py:57-365 for complete implementation.

3. Background Training with Thread Safety

Training runs in a background thread to avoid blocking the UI. The worker uses file-based I/O instead of st.session_state since session state is thread-local:
def _run_training(session_id: str, experiment_id: str):
    # Get configs from files (thread-safe)
    experiment = get_experiment_from_file(session_id, experiment_id)
    model_config = get_model_from_file(session_id, experiment['model_id'])
    
    # Build model and train
    model = build_model(model_config)
    engine = TrainingEngine(...)
    results = engine.fit(epochs=epochs)
    
    # Write results to file (thread-safe)
    write_experiment_update(session_id, experiment_id, results)
See app/training/worker.py:45-257 for complete implementation.

4. Model Architecture Pattern

All models inherit from BaseModel abstract class and implement:
  • build() - Constructs and returns the PyTorch nn.Module
  • get_parameters_count() - Returns total and trainable parameter counts
  • validate_config() - Validates model configuration
class BaseModel(ABC):
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.model = None
    
    @abstractmethod
    def build(self) -> nn.Module:
        pass
    
    @abstractmethod
    def get_parameters_count(self) -> Tuple[int, int]:
        pass
See app/models/base.py:11-71 for complete implementation.

Data Flow

Configuration Flow

Training Flow

Real-Time Monitoring

The monitoring page uses Streamlit’s @st.fragment(run_every="1s") to create auto-refreshing components:
@st.fragment(run_every="1s")
def live_training_monitor():
    if not is_training_active():
        return
    
    # Get latest metrics from file
    results = get_results()
    
    # Display live metrics
    col1, col2, col3 = st.columns(3)
    col1.metric("Epoch", results.get("epoch", 0))
    col2.metric("Loss", f"{results.get('loss', 0):.4f}")
    col3.metric("Accuracy", f"{results.get('accuracy', 0):.2%}")
See .md/arch.md:432-464 for complete pattern.

Component Architecture

Shared Components

Components are flat, reusable modules that can be imported by any page:

Integration Points

PyTorch Integration

The platform integrates tightly with PyTorch for all ML operations: Model Buildingapp/models/pytorch/
  • Custom CNN from layer configuration
  • Transfer learning with pre-trained models (VGG, ResNet, EfficientNet)
  • Vision Transformers with patch embeddings
Data Loadingapp/training/dataset.py:129-249
  • MalwareDataset - Custom PyTorch Dataset
  • Automatic train/val/test splitting
  • Weighted sampling for imbalanced classes
  • Data augmentation pipeline
Trainingapp/training/engine.py:13-306
  • TrainingEngine - Core training loop
  • Callbacks for checkpointing and metrics
  • Early stopping support
  • Learning rate scheduling

File System Integration

Session Persistenceapp/state/persistence.py
.streamlit_sessions/
└── {session_id}/
    ├── session.json       # Session metadata
    ├── dataset.json       # Dataset configuration
    ├── models.json        # Model configurations
    ├── training.json      # Training configurations
    └── experiments.json   # Experiment results
Model Checkpointsapp/utils/checkpoint_manager.py
checkpoints/
└── {experiment_id}/
    ├── checkpoint_epoch_10.pth
    ├── checkpoint_epoch_20.pth
    └── best_model.pth     # Best model by validation loss

Performance Considerations

Caching Strategy

The platform uses Streamlit’s caching decorators to optimize performance: Data Caching - @st.cache_data(ttl=300)
  • Dataset scanning (expensive directory traversal)
  • Image loading and preprocessing
  • Visualization generation
Resource Caching - @st.cache_resource
  • Trained model loading (singleton)
  • Large data structures
  • Database connections
@st.cache_data(ttl=300)
def scan_dataset_directory(base_path: str) -> dict[str, Any]:
    """Expensive dataset scanning - cached for 5 minutes"""
    # ...
    
@st.cache_resource
def load_trained_model(model_path: str):
    """Singleton model loading - loaded once per session"""
    return torch.load(model_path)
See .md/arch.md:405-429 for complete patterns.

GPU Memory Management

  • Automatic device detection (CUDA > MPS > CPU)
  • Batch size configuration based on available memory
  • Model parameter counting before training
  • Memory monitoring in sidebar

Security & Isolation

Session Isolation

Each user session is isolated with:
  • Unique session ID (UUID4)
  • Separate file storage directory
  • Independent session state
  • Isolated experiment tracking

Thread Safety

Background training threads are isolated from UI thread:
  • No shared memory access (uses file I/O)
  • Thread registry for pause/stop control
  • Daemon threads (auto-cleanup on exit)
# Global registry for active training engines
_active_engines: dict[str, TrainingEngine] = {}
_training_threads: dict[str, threading.Thread] = {}
See app/training/worker.py:24-26 for implementation.

Extensibility

Adding New Model Architectures

  1. Create new file in app/models/pytorch/
  2. Inherit from BaseModel
  3. Implement build() and get_parameters_count()
  4. Register in worker.py:build_model()
class MyCustomModel(BaseModel):
    def build(self) -> nn.Module:
        # Build your model
        return model
    
    def get_parameters_count(self) -> Tuple[int, int]:
        total = sum(p.numel() for p in self.model.parameters())
        trainable = sum(p.numel() for p in self.model.parameters() 
                       if p.requires_grad)
        return total, trainable

Adding New Pages

  1. Create folder in content/ with page.py and view.py
  2. For complex pages, add tabs/ subfolder
  3. Register in main.py navigation
st.Page("content/my_page/page.py", title="My Page", icon="🆕")

Technology Stack

Frontend

  • Streamlit - Web UI framework
  • Plotly - Interactive visualizations
  • Pillow - Image processing

Backend

  • PyTorch - Deep learning framework
  • scikit-learn - Data splitting & metrics
  • NumPy - Numerical operations

Models

  • torchvision - Pre-trained models
  • Custom CNN - Layer stack builder
  • Vision Transformer - From scratch implementation

Infrastructure

  • Threading - Background training
  • JSON - Configuration persistence
  • pathlib - File system operations

References

  • Complete architecture documentation: app/.md/arch.md
  • Model implementations: app/models/
  • Training pipeline: app/training/
  • State management: app/state/

Build docs developers (and LLMs) love