System Architecture - UC Intel Final

Overview

The UC Intel Final platform is a comprehensive malware classification system built with a modern, modular architecture that separates concerns between the web interface (Streamlit), machine learning models (PyTorch), and training pipeline.

Streamlit UI

Interactive multi-page dashboard for experiment configuration and monitoring

PyTorch Models

Custom CNN, Transfer Learning, and Vision Transformer architectures

Training Pipeline

Background training engine with real-time monitoring and checkpointing

State Management

File-based persistence with session state abstraction

High-Level Architecture

Application Architecture

Directory Structure

The platform follows a self-contained architecture where each module is isolated and communicates through well-defined interfaces:

app/
├── main.py                      # Entry point + navigation setup
│
├── content/                     # Self-contained page modules
│   ├── home/                   # Session management
│   ├── dataset/                # Dataset configuration with tabs
│   ├── model/                  # Model architecture builder
│   ├── training/               # Training configuration
│   ├── monitor/                # Live training monitoring
│   ├── results/                # Results & evaluation
│   └── interpret/              # Model interpretability
│
├── components/                  # Shared UI components (flat structure)
│   ├── header.py               # App header with session info
│   ├── sidebar.py              # Configuration status sidebar
│   ├── theme.py                # Theme customization
│   ├── styling.py              # CSS injection
│   └── utils.py                # GPU detection, utilities
│
├── state/                       # Session state management
│   ├── workflow.py             # ML workflow state
│   ├── ui.py                   # UI preferences
│   ├── cache.py                # Cached operations
│   ├── session_state.py        # Session utilities
│   └── persistence.py          # File-based persistence
│
├── models/                      # PyTorch model architectures
│   ├── base.py                 # Abstract base class
│   ├── pytorch/
│   │   ├── cnn_builder.py     # Custom CNN builder
│   │   ├── transfer.py        # Transfer learning models
│   │   └── transformer.py     # Vision Transformer
│   └── manual/                 # Manual implementations
│
├── training/                    # Training infrastructure
│   ├── engine.py               # Core training loop
│   ├── worker.py               # Background training
│   ├── dataset.py              # PyTorch Dataset & DataLoader
│   ├── transforms.py           # Image transformations
│   ├── optimizers.py           # Optimizer & scheduler factory
│   └── evaluator.py            # Model evaluation
│
└── utils/                       # Utility functions
    ├── dataset_utils.py        # Dataset scanning
    ├── dataset_viz.py          # Visualizations
    └── checkpoint_manager.py   # Model checkpointing

The architecture uses NO __init__.py files - all imports use absolute paths from the project root (e.g., from models.pytorch.cnn_builder import CustomCNNBuilder).

Architecture Principles

1. Self-Contained Page Modules

Each page in content/ is fully self-contained with:

page.py - Entry point that renders header/sidebar and calls view
view.py - Main view logic and coordinator
tabs/ - Optional subfolder for complex multi-tab pages

Example: Dataset Page Structure

# content/dataset/page.py
from components.header import render_header
from components.sidebar import render_sidebar
from content.dataset import view

render_header()
render_sidebar()
view.render()

2. State Management Abstraction

Critical Design Pattern: All session state access goes through state/ module functions. NEVER access st.session_state directly from page code.

State is divided into three domains:

workflow.py - ML workflow state (configs, training status, results)
ui.py - UI preferences (theme, past sessions)
cache.py - Cached operations (dataset scans, expensive computations)

Example: Proper State Access

# ✅ CORRECT
from state.workflow import get_dataset_config, save_dataset_config
config = get_dataset_config()
save_dataset_config(new_config)

# ❌ WRONG
config = st.session_state.dataset_config
st.session_state.dataset_config = new_config

See app/state/workflow.py:57-365 for complete implementation.

3. Background Training with Thread Safety

Training runs in a background thread to avoid blocking the UI. The worker uses file-based I/O instead of st.session_state since session state is thread-local:

def _run_training(session_id: str, experiment_id: str):
    # Get configs from files (thread-safe)
    experiment = get_experiment_from_file(session_id, experiment_id)
    model_config = get_model_from_file(session_id, experiment['model_id'])
    
    # Build model and train
    model = build_model(model_config)
    engine = TrainingEngine(...)
    results = engine.fit(epochs=epochs)
    
    # Write results to file (thread-safe)
    write_experiment_update(session_id, experiment_id, results)

See app/training/worker.py:45-257 for complete implementation.

4. Model Architecture Pattern

All models inherit from BaseModel abstract class and implement:

build() - Constructs and returns the PyTorch nn.Module
get_parameters_count() - Returns total and trainable parameter counts
validate_config() - Validates model configuration

class BaseModel(ABC):
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.model = None
    
    @abstractmethod
    def build(self) -> nn.Module:
        pass
    
    @abstractmethod
    def get_parameters_count(self) -> Tuple[int, int]:
        pass

See app/models/base.py:11-71 for complete implementation.

Data Flow

Configuration Flow

Training Flow

Real-Time Monitoring

The monitoring page uses Streamlit’s @st.fragment(run_every="1s") to create auto-refreshing components:

@st.fragment(run_every="1s")
def live_training_monitor():
    if not is_training_active():
        return
    
    # Get latest metrics from file
    results = get_results()
    
    # Display live metrics
    col1, col2, col3 = st.columns(3)
    col1.metric("Epoch", results.get("epoch", 0))
    col2.metric("Loss", f"{results.get('loss', 0):.4f}")
    col3.metric("Accuracy", f"{results.get('accuracy', 0):.2%}")

See .md/arch.md:432-464 for complete pattern.

Component Architecture

Shared Components

Components are flat, reusable modules that can be imported by any page:

Header
Sidebar
Theme
Utils

Location: app/components/header.pyRenders the application header with:

Application title and logo
Current session ID
Session info button
Navigation breadcrumbs

from components.header import render_header
render_header()

Location: app/components/sidebar.pyShows configuration status in sidebar:

✅ Dataset configured
✅ Model configured
✅ Training configured
Quick navigation links
GPU/CPU status

from components.sidebar import render_sidebar
render_sidebar()

Location: app/components/theme.pyTheme customization with:

Color pickers (primary, secondary, background)
4 preset themes (Green, Blue, Pink, Orange)
Dynamic CSS injection
Persistent across sessions

from components.theme import render_theme_customization
render_theme_customization()

Location: app/components/utils.pyUtility functions:

detect_gpu() - Detect CUDA/MPS availability
get_gpu_memory() - Get GPU memory usage
generate_session_id() - Generate unique session IDs
format_bytes() - Format byte sizes

Integration Points

PyTorch Integration

The platform integrates tightly with PyTorch for all ML operations: Model Building → app/models/pytorch/

Custom CNN from layer configuration
Transfer learning with pre-trained models (VGG, ResNet, EfficientNet)
Vision Transformers with patch embeddings

Data Loading → app/training/dataset.py:129-249

MalwareDataset - Custom PyTorch Dataset
Automatic train/val/test splitting
Weighted sampling for imbalanced classes
Data augmentation pipeline

Training → app/training/engine.py:13-306

TrainingEngine - Core training loop
Callbacks for checkpointing and metrics
Early stopping support
Learning rate scheduling

File System Integration

Session Persistence → app/state/persistence.py

.streamlit_sessions/
└── {session_id}/
    ├── session.json       # Session metadata
    ├── dataset.json       # Dataset configuration
    ├── models.json        # Model configurations
    ├── training.json      # Training configurations
    └── experiments.json   # Experiment results

Model Checkpoints → app/utils/checkpoint_manager.py

checkpoints/
└── {experiment_id}/
    ├── checkpoint_epoch_10.pth
    ├── checkpoint_epoch_20.pth
    └── best_model.pth     # Best model by validation loss

Performance Considerations

Caching Strategy

The platform uses Streamlit’s caching decorators to optimize performance: Data Caching - @st.cache_data(ttl=300)

Dataset scanning (expensive directory traversal)
Image loading and preprocessing
Visualization generation

Resource Caching - @st.cache_resource

Trained model loading (singleton)
Large data structures
Database connections

@st.cache_data(ttl=300)
def scan_dataset_directory(base_path: str) -> dict[str, Any]:
    """Expensive dataset scanning - cached for 5 minutes"""
    # ...
    
@st.cache_resource
def load_trained_model(model_path: str):
    """Singleton model loading - loaded once per session"""
    return torch.load(model_path)

See .md/arch.md:405-429 for complete patterns.

GPU Memory Management

Automatic device detection (CUDA > MPS > CPU)
Batch size configuration based on available memory
Model parameter counting before training
Memory monitoring in sidebar

Security & Isolation

Session Isolation

Each user session is isolated with:

Unique session ID (UUID4)
Separate file storage directory
Independent session state
Isolated experiment tracking

Thread Safety

Background training threads are isolated from UI thread:

No shared memory access (uses file I/O)
Thread registry for pause/stop control
Daemon threads (auto-cleanup on exit)

# Global registry for active training engines
_active_engines: dict[str, TrainingEngine] = {}
_training_threads: dict[str, threading.Thread] = {}

See app/training/worker.py:24-26 for implementation.

Extensibility

Adding New Model Architectures

Create new file in app/models/pytorch/
Inherit from BaseModel
Implement build() and get_parameters_count()
Register in worker.py:build_model()

class MyCustomModel(BaseModel):
    def build(self) -> nn.Module:
        # Build your model
        return model
    
    def get_parameters_count(self) -> Tuple[int, int]:
        total = sum(p.numel() for p in self.model.parameters())
        trainable = sum(p.numel() for p in self.model.parameters() 
                       if p.requires_grad)
        return total, trainable

Adding New Pages

Create folder in content/ with page.py and view.py
For complex pages, add tabs/ subfolder
Register in main.py navigation

st.Page("content/my_page/page.py", title="My Page", icon="🆕")

Technology Stack

Frontend

Streamlit - Web UI framework
Plotly - Interactive visualizations
Pillow - Image processing

Backend

PyTorch - Deep learning framework
scikit-learn - Data splitting & metrics
NumPy - Numerical operations

Models

torchvision - Pre-trained models
Custom CNN - Layer stack builder
Vision Transformer - From scratch implementation

Infrastructure

Threading - Background training
JSON - Configuration persistence
pathlib - File system operations

References

Complete architecture documentation: app/.md/arch.md
Model implementations: app/models/
Training pipeline: app/training/
State management: app/state/

Get Started

Core Concepts

Dashboard Guide

Training

Model Interpretability

​Overview

Streamlit UI

PyTorch Models

Training Pipeline

State Management

​High-Level Architecture

​Application Architecture

​Directory Structure

​Architecture Principles

​1. Self-Contained Page Modules

​2. State Management Abstraction

​3. Background Training with Thread Safety

​4. Model Architecture Pattern

​Data Flow

​Configuration Flow

​Training Flow

​Real-Time Monitoring

​Component Architecture

​Shared Components

​Integration Points

​PyTorch Integration

​File System Integration

​Performance Considerations

​Caching Strategy

​GPU Memory Management

​Security & Isolation

​Session Isolation

​Thread Safety

​Extensibility

​Adding New Model Architectures

​Adding New Pages

​Technology Stack

Frontend

Backend

Models

Infrastructure

​References

Build docs developers (and LLMs) love

Overview

High-Level Architecture

Application Architecture

Directory Structure

Architecture Principles

1. Self-Contained Page Modules

2. State Management Abstraction

3. Background Training with Thread Safety

4. Model Architecture Pattern

Data Flow

Configuration Flow

Training Flow

Real-Time Monitoring

Component Architecture

Shared Components

Integration Points

PyTorch Integration

File System Integration

Performance Considerations

Caching Strategy

GPU Memory Management

Security & Isolation

Session Isolation

Thread Safety

Extensibility

Adding New Model Architectures

Adding New Pages

Technology Stack

References