Model Selection

Overview

The UC Intel Final platform supports three main categories of models for malware classification:

Custom CNN: Build convolutional neural networks from scratch
Transfer Learning: Leverage pre-trained models (ResNet, EfficientNet, Vision Transformer)
Transformer: Custom vision transformer architectures

This guide helps you choose the right architecture and configure it properly.

Model Building System

All models are built using a builder pattern that converts configuration dictionaries into PyTorch modules. Source: app/training/worker.py:29-42

def build_model(model_config: dict[str, Any]) -> torch.nn.Module:
    """Build PyTorch model from config."""
    model_type = model_config.get("model_type")

    if model_type == "Custom CNN":
        builder = CustomCNNBuilder(model_config)
    elif model_type == "Transfer Learning":
        builder = TransferLearningBuilder(model_config)
    elif model_type == "Transformer":
        builder = TransformerBuilder(model_config)
    else:
        raise ValueError(f"Unknown model type: {model_type}")

    return builder.build()

Custom CNN

When to Use

Use Custom CNN when:

You have a small dataset (<1000 images per class)
You want full control over architecture
You need a lightweight model for deployment
You want to experiment with novel architectures
Transfer learning is overkill for your problem

Architecture Components

Custom CNNs are built with configurable convolutional blocks:

custom_cnn_config = {
    "model_type": "Custom CNN",
    "input_shape": (224, 224, 3),
    "num_classes": 10,
    "blocks": [
        {
            "type": "conv",
            "filters": 32,
            "kernel_size": 3,
            "activation": "relu",
            "pooling": "max",
            "pool_size": 2,
            "dropout": 0.25
        },
        {
            "type": "conv",
            "filters": 64,
            "kernel_size": 3,
            "activation": "relu",
            "pooling": "max",
            "pool_size": 2,
            "dropout": 0.25
        },
        {
            "type": "conv",
            "filters": 128,
            "kernel_size": 3,
            "activation": "relu",
            "pooling": "max",
            "pool_size": 2,
            "dropout": 0.3
        }
    ],
    "dense_layers": [
        {"units": 256, "activation": "relu", "dropout": 0.5},
        {"units": 128, "activation": "relu", "dropout": 0.5}
    ]
}

Design Guidelines

Start Simple

Begin with 3-4 convolutional blocks. Add more only if needed.

Increase Filters Gradually

Use progression like 32 → 64 → 128 → 256. Each layer should extract higher-level features.

Use Pooling

Add max pooling after conv blocks to reduce spatial dimensions and computation.

Apply Dropout

Use 0.25-0.3 dropout in conv layers, 0.5 in dense layers to prevent overfitting.

Keep Dense Layers Small

1-2 dense layers are usually sufficient. More layers = more overfitting risk.

Example Architectures

Lightweight CNN (Small Dataset)

{
    "blocks": [
        {"filters": 32, "kernel_size": 3, "pooling": "max", "dropout": 0.25},
        {"filters": 64, "kernel_size": 3, "pooling": "max", "dropout": 0.25}
    ],
    "dense_layers": [
        {"units": 128, "activation": "relu", "dropout": 0.5}
    ]
}

Parameters: ~100K | Best for: 500-1000 images per class

Medium CNN (Moderate Dataset)

{
    "blocks": [
        {"filters": 32, "kernel_size": 3, "pooling": "max", "dropout": 0.25},
        {"filters": 64, "kernel_size": 3, "pooling": "max", "dropout": 0.25},
        {"filters": 128, "kernel_size": 3, "pooling": "max", "dropout": 0.3}
    ],
    "dense_layers": [
        {"units": 256, "activation": "relu", "dropout": 0.5}
    ]
}

Parameters: ~500K | Best for: 1000-5000 images per class

Deep CNN (Large Dataset)

{
    "blocks": [
        {"filters": 64, "kernel_size": 3, "pooling": "max", "dropout": 0.25},
        {"filters": 128, "kernel_size": 3, "pooling": "max", "dropout": 0.25},
        {"filters": 256, "kernel_size": 3, "pooling": "max", "dropout": 0.3},
        {"filters": 512, "kernel_size": 3, "pooling": "max", "dropout": 0.3}
    ],
    "dense_layers": [
        {"units": 512, "activation": "relu", "dropout": 0.5},
        {"units": 256, "activation": "relu", "dropout": 0.5}
    ]
}

Parameters: ~2M | Best for: 5000+ images per class

Transfer Learning

When to Use

Use Transfer Learning when:

You have limited training data (any size dataset)
You want state-of-the-art performance
You need faster convergence
You have access to GPU resources
You want to leverage features learned from ImageNet

Available Architectures

The platform supports multiple pre-trained backbones:

Architecture	Parameters	Speed	Accuracy	Best For
ResNet50	25M	Fast	Good	Balanced performance, general use
ResNet101	44M	Medium	Better	When you need higher accuracy
EfficientNet-B0	5M	Fast	Good	Limited GPU memory
EfficientNet-B3	12M	Medium	Better	Balanced efficiency/accuracy
EfficientNet-B7	66M	Slow	Best	Maximum accuracy, large GPU
Vision Transformer (ViT)	86M	Slow	Best	Large datasets, cutting-edge

Configuration

transfer_learning_config = {
    "model_type": "Transfer Learning",
    "backbone": "ResNet50",
    "pretrained": True,
    "freeze_backbone": True,  # Freeze early layers
    "num_classes": 10,
    "input_shape": (224, 224, 3),
    "dropout": 0.5
}

Fine-Tuning Strategies

Feature Extraction (Recommended Start)

Set freeze_backbone=True. Only train the final classification layer.Pros: Fast training, works with small datasets, prevents overfittingUse when: Dataset < 1000 images per class

Partial Fine-Tuning

Freeze early layers, unfreeze later layers. Allows backbone to adapt.Pros: Better accuracy, moderate training timeUse when: Dataset 1000-5000 images per class

Full Fine-Tuning

Set freeze_backbone=False. Train all layers with lower learning rate.Pros: Maximum accuracy, full model adaptationUse when: Dataset > 5000 images per class

Architecture Selection Guide

ResNet (Recommended for Most Use Cases)

When to use:

General-purpose malware classification
Good balance of speed and accuracy
Mature, well-tested architecture

Choose ResNet50 for:

Standard datasets with moderate complexity
Limited GPU memory (< 8GB)

Choose ResNet101 for:

Complex datasets with many classes
When accuracy is priority over speed

EfficientNet (Best Efficiency)

When to use:

Limited GPU resources
Need fast inference for deployment
Want best accuracy-to-parameters ratio

Choose EfficientNet-B0 for:

Deployment on edge devices
Very limited GPU memory (< 4GB)

Choose EfficientNet-B3/B7 for:

Maximum accuracy with efficient architecture
Moderate to large GPU memory available

Vision Transformer (State-of-the-Art)

When to use:

Large datasets (5000+ images per class)
Maximum possible accuracy is required
You have powerful GPU (8GB+ VRAM)
Training time is not a constraint

Important:

ViT requires more data than CNNs to perform well
Training is significantly slower than ResNet/EfficientNet
Consider using with strong data augmentation

Transformer Models

Custom Vision Transformer

Build custom vision transformer architectures with configurable attention mechanisms.

transformer_config = {
    "model_type": "Transformer",
    "image_size": 224,
    "patch_size": 16,
    "num_classes": 10,
    "dim": 768,
    "depth": 12,
    "heads": 12,
    "mlp_dim": 3072,
    "dropout": 0.1,
    "emb_dropout": 0.1
}

Custom transformers require:

Large datasets (10,000+ total images minimum)
Powerful GPU with 16GB+ VRAM
Extended training time (3-5x longer than CNNs)
Strong regularization and augmentation

For most use cases, Transfer Learning with ViT is preferred over custom transformers.

Model Initialization

During training, the model is initialized and moved to the appropriate device: Source: app/training/worker.py:84-103

# Determine device
device = torch.device(
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"[Training] Using device: {device}")

# Build model
print("[Training] Building model...")
model = build_model(model_config)
model = model.to(device)

total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(
    f"[Training] Parameters: {total_params:,} total, {trainable_params:,} trainable"
)

The platform automatically detects available hardware:

CUDA: NVIDIA GPUs (preferred)
MPS: Apple Silicon GPUs (M1/M2/M3)
CPU: Fallback for systems without GPU

Model Comparison

Decision Tree

┌─ Dataset Size < 1000 per class?
│  ├─ Yes → Custom CNN (Lightweight)
│  └─ No ↓
│
├─ Dataset Size 1000-5000 per class?
│  ├─ Yes → Transfer Learning (ResNet50, freeze_backbone=True)
│  └─ No ↓
│
├─ Dataset Size > 5000 per class?
│  ├─ GPU Memory < 8GB → EfficientNet-B3
│  ├─ GPU Memory 8-16GB → ResNet101 or EfficientNet-B7
│  └─ GPU Memory > 16GB → Vision Transformer (ViT)
│
└─ Deployment on Edge Device?
   └─ Yes → EfficientNet-B0 or Custom CNN (Lightweight)

Performance Comparison

Model	Dataset Size	Training Time	Memory	Accuracy
Custom CNN (Light)	500-1K/class	5-10 min/epoch	2GB	75-85%
Custom CNN (Medium)	1K-5K/class	10-20 min/epoch	4GB	80-88%
ResNet50 (frozen)	Any	15-30 min/epoch	6GB	85-92%
ResNet50 (fine-tuned)	5K+/class	30-60 min/epoch	8GB	90-95%
EfficientNet-B3	5K+/class	40-80 min/epoch	8GB	91-96%
Vision Transformer	10K+/class	60-120 min/epoch	16GB	92-97%

Note: Times are approximate for 10-class dataset on RTX 3080. Accuracy depends on data quality.

Best Practices

Starting Point

Baseline

Start with ResNet50 with freeze_backbone=True as your baseline

Evaluate

Train for 20-30 epochs and evaluate validation accuracy

Iterate

If underfitting → try unfreezing backbone or larger modelIf overfitting → add regularization, use smaller model, or increase augmentation

Regularization

Dropout: 0.5 is standard for dense layers, 0.25-0.3 for conv layers
L2 Decay: Use 0.0001-0.001 with AdamW optimizer
Data Augmentation: Essential for preventing overfitting

Input Size

224×224: Standard size, works with all pre-trained models
256×256: Use for high-resolution malware visualizations
128×128: Faster training, good for resource-constrained environments

When using transfer learning, input size should match the pre-training size (usually 224×224) for best results.

Troubleshooting

Model Not Learning (Loss Plateaus)

Possible causes:

Learning rate too high or too low → Try 1e-3 to 1e-5
Backbone frozen but needs fine-tuning → Set freeze_backbone=False
Model too simple → Try larger architecture
Data quality issues → Check dataset preprocessing

Overfitting (Train Acc >> Val Acc)

Solutions:

Increase dropout (try 0.5-0.7)
Add more data augmentation
Use smaller model
Enable L2 regularization
Reduce number of epochs
Use early stopping

Out of Memory Errors

Solutions:

Reduce batch size (try 16 or 8)
Use smaller input size (128×128)
Switch to lighter model (EfficientNet-B0)
Enable gradient checkpointing (advanced)

Slow Training

Solutions:

Reduce input size
Use fewer data augmentation transforms
Increase num_workers in DataLoader
Use mixed precision training (advanced)
Switch to faster model (ResNet50 instead of ViT)

Get Started

Core Concepts

Dashboard Guide

Training

Model Interpretability

Overview

Model Building System

Custom CNN

When to Use

Architecture Components

Design Guidelines

Example Architectures

Transfer Learning

When to Use

Available Architectures

Configuration

Fine-Tuning Strategies

Architecture Selection Guide

Transformer Models

Custom Vision Transformer

Model Initialization

Model Comparison

Decision Tree

Performance Comparison

Best Practices

Starting Point

Regularization

Input Size

Troubleshooting

Next Steps

Hyperparameters

Model Evaluation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Dashboard Guide

Training

Model Interpretability

​Overview

​Model Building System

​Custom CNN

​When to Use

​Architecture Components

​Design Guidelines

​Example Architectures

​Transfer Learning

​When to Use

​Available Architectures

​Configuration

​Fine-Tuning Strategies

​Architecture Selection Guide

​Transformer Models

​Custom Vision Transformer

​Model Initialization

​Model Comparison

​Decision Tree

​Performance Comparison

​Best Practices

​Starting Point

​Regularization

​Input Size

​Troubleshooting

​Next Steps

Hyperparameters

Model Evaluation

Build docs developers (and LLMs) love

Overview

Model Building System

Custom CNN

When to Use

Architecture Components

Design Guidelines

Example Architectures

Transfer Learning

When to Use

Available Architectures

Configuration

Fine-Tuning Strategies

Architecture Selection Guide

Transformer Models

Custom Vision Transformer

Model Initialization

Model Comparison

Decision Tree

Performance Comparison

Best Practices

Starting Point

Regularization

Input Size

Troubleshooting

Next Steps