Skip to main content

Overview

The UC Intel Final platform supports three main categories of models for malware classification:
  1. Custom CNN: Build convolutional neural networks from scratch
  2. Transfer Learning: Leverage pre-trained models (ResNet, EfficientNet, Vision Transformer)
  3. Transformer: Custom vision transformer architectures
This guide helps you choose the right architecture and configure it properly.

Model Building System

All models are built using a builder pattern that converts configuration dictionaries into PyTorch modules. Source: app/training/worker.py:29-42
def build_model(model_config: dict[str, Any]) -> torch.nn.Module:
    """Build PyTorch model from config."""
    model_type = model_config.get("model_type")

    if model_type == "Custom CNN":
        builder = CustomCNNBuilder(model_config)
    elif model_type == "Transfer Learning":
        builder = TransferLearningBuilder(model_config)
    elif model_type == "Transformer":
        builder = TransformerBuilder(model_config)
    else:
        raise ValueError(f"Unknown model type: {model_type}")

    return builder.build()

Custom CNN

When to Use

Use Custom CNN when:
  • You have a small dataset (<1000 images per class)
  • You want full control over architecture
  • You need a lightweight model for deployment
  • You want to experiment with novel architectures
  • Transfer learning is overkill for your problem

Architecture Components

Custom CNNs are built with configurable convolutional blocks:
custom_cnn_config = {
    "model_type": "Custom CNN",
    "input_shape": (224, 224, 3),
    "num_classes": 10,
    "blocks": [
        {
            "type": "conv",
            "filters": 32,
            "kernel_size": 3,
            "activation": "relu",
            "pooling": "max",
            "pool_size": 2,
            "dropout": 0.25
        },
        {
            "type": "conv",
            "filters": 64,
            "kernel_size": 3,
            "activation": "relu",
            "pooling": "max",
            "pool_size": 2,
            "dropout": 0.25
        },
        {
            "type": "conv",
            "filters": 128,
            "kernel_size": 3,
            "activation": "relu",
            "pooling": "max",
            "pool_size": 2,
            "dropout": 0.3
        }
    ],
    "dense_layers": [
        {"units": 256, "activation": "relu", "dropout": 0.5},
        {"units": 128, "activation": "relu", "dropout": 0.5}
    ]
}

Design Guidelines

1

Start Simple

Begin with 3-4 convolutional blocks. Add more only if needed.
2

Increase Filters Gradually

Use progression like 32 → 64 → 128 → 256. Each layer should extract higher-level features.
3

Use Pooling

Add max pooling after conv blocks to reduce spatial dimensions and computation.
4

Apply Dropout

Use 0.25-0.3 dropout in conv layers, 0.5 in dense layers to prevent overfitting.
5

Keep Dense Layers Small

1-2 dense layers are usually sufficient. More layers = more overfitting risk.

Example Architectures

{
    "blocks": [
        {"filters": 32, "kernel_size": 3, "pooling": "max", "dropout": 0.25},
        {"filters": 64, "kernel_size": 3, "pooling": "max", "dropout": 0.25}
    ],
    "dense_layers": [
        {"units": 128, "activation": "relu", "dropout": 0.5}
    ]
}
Parameters: ~100K | Best for: 500-1000 images per class
{
    "blocks": [
        {"filters": 32, "kernel_size": 3, "pooling": "max", "dropout": 0.25},
        {"filters": 64, "kernel_size": 3, "pooling": "max", "dropout": 0.25},
        {"filters": 128, "kernel_size": 3, "pooling": "max", "dropout": 0.3}
    ],
    "dense_layers": [
        {"units": 256, "activation": "relu", "dropout": 0.5}
    ]
}
Parameters: ~500K | Best for: 1000-5000 images per class
{
    "blocks": [
        {"filters": 64, "kernel_size": 3, "pooling": "max", "dropout": 0.25},
        {"filters": 128, "kernel_size": 3, "pooling": "max", "dropout": 0.25},
        {"filters": 256, "kernel_size": 3, "pooling": "max", "dropout": 0.3},
        {"filters": 512, "kernel_size": 3, "pooling": "max", "dropout": 0.3}
    ],
    "dense_layers": [
        {"units": 512, "activation": "relu", "dropout": 0.5},
        {"units": 256, "activation": "relu", "dropout": 0.5}
    ]
}
Parameters: ~2M | Best for: 5000+ images per class

Transfer Learning

When to Use

Use Transfer Learning when:
  • You have limited training data (any size dataset)
  • You want state-of-the-art performance
  • You need faster convergence
  • You have access to GPU resources
  • You want to leverage features learned from ImageNet

Available Architectures

The platform supports multiple pre-trained backbones:
ArchitectureParametersSpeedAccuracyBest For
ResNet5025MFastGoodBalanced performance, general use
ResNet10144MMediumBetterWhen you need higher accuracy
EfficientNet-B05MFastGoodLimited GPU memory
EfficientNet-B312MMediumBetterBalanced efficiency/accuracy
EfficientNet-B766MSlowBestMaximum accuracy, large GPU
Vision Transformer (ViT)86MSlowBestLarge datasets, cutting-edge

Configuration

transfer_learning_config = {
    "model_type": "Transfer Learning",
    "backbone": "ResNet50",
    "pretrained": True,
    "freeze_backbone": True,  # Freeze early layers
    "num_classes": 10,
    "input_shape": (224, 224, 3),
    "dropout": 0.5
}

Fine-Tuning Strategies

1

Feature Extraction (Recommended Start)

Set freeze_backbone=True. Only train the final classification layer.Pros: Fast training, works with small datasets, prevents overfittingUse when: Dataset < 1000 images per class
2

Partial Fine-Tuning

Freeze early layers, unfreeze later layers. Allows backbone to adapt.Pros: Better accuracy, moderate training timeUse when: Dataset 1000-5000 images per class
3

Full Fine-Tuning

Set freeze_backbone=False. Train all layers with lower learning rate.Pros: Maximum accuracy, full model adaptationUse when: Dataset > 5000 images per class

Architecture Selection Guide

When to use:
  • Limited GPU resources
  • Need fast inference for deployment
  • Want best accuracy-to-parameters ratio
Choose EfficientNet-B0 for:
  • Deployment on edge devices
  • Very limited GPU memory (< 4GB)
Choose EfficientNet-B3/B7 for:
  • Maximum accuracy with efficient architecture
  • Moderate to large GPU memory available
When to use:
  • Large datasets (5000+ images per class)
  • Maximum possible accuracy is required
  • You have powerful GPU (8GB+ VRAM)
  • Training time is not a constraint
Important:
  • ViT requires more data than CNNs to perform well
  • Training is significantly slower than ResNet/EfficientNet
  • Consider using with strong data augmentation

Transformer Models

Custom Vision Transformer

Build custom vision transformer architectures with configurable attention mechanisms.
transformer_config = {
    "model_type": "Transformer",
    "image_size": 224,
    "patch_size": 16,
    "num_classes": 10,
    "dim": 768,
    "depth": 12,
    "heads": 12,
    "mlp_dim": 3072,
    "dropout": 0.1,
    "emb_dropout": 0.1
}
Custom transformers require:
  • Large datasets (10,000+ total images minimum)
  • Powerful GPU with 16GB+ VRAM
  • Extended training time (3-5x longer than CNNs)
  • Strong regularization and augmentation
For most use cases, Transfer Learning with ViT is preferred over custom transformers.

Model Initialization

During training, the model is initialized and moved to the appropriate device: Source: app/training/worker.py:84-103
# Determine device
device = torch.device(
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"[Training] Using device: {device}")

# Build model
print("[Training] Building model...")
model = build_model(model_config)
model = model.to(device)

total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(
    f"[Training] Parameters: {total_params:,} total, {trainable_params:,} trainable"
)
The platform automatically detects available hardware:
  • CUDA: NVIDIA GPUs (preferred)
  • MPS: Apple Silicon GPUs (M1/M2/M3)
  • CPU: Fallback for systems without GPU

Model Comparison

Decision Tree

┌─ Dataset Size < 1000 per class?
│  ├─ Yes → Custom CNN (Lightweight)
│  └─ No ↓

├─ Dataset Size 1000-5000 per class?
│  ├─ Yes → Transfer Learning (ResNet50, freeze_backbone=True)
│  └─ No ↓

├─ Dataset Size > 5000 per class?
│  ├─ GPU Memory < 8GB → EfficientNet-B3
│  ├─ GPU Memory 8-16GB → ResNet101 or EfficientNet-B7
│  └─ GPU Memory > 16GB → Vision Transformer (ViT)

└─ Deployment on Edge Device?
   └─ Yes → EfficientNet-B0 or Custom CNN (Lightweight)

Performance Comparison

ModelDataset SizeTraining TimeMemoryAccuracy
Custom CNN (Light)500-1K/class5-10 min/epoch2GB75-85%
Custom CNN (Medium)1K-5K/class10-20 min/epoch4GB80-88%
ResNet50 (frozen)Any15-30 min/epoch6GB85-92%
ResNet50 (fine-tuned)5K+/class30-60 min/epoch8GB90-95%
EfficientNet-B35K+/class40-80 min/epoch8GB91-96%
Vision Transformer10K+/class60-120 min/epoch16GB92-97%
Note: Times are approximate for 10-class dataset on RTX 3080. Accuracy depends on data quality.

Best Practices

Starting Point

1

Baseline

Start with ResNet50 with freeze_backbone=True as your baseline
2

Evaluate

Train for 20-30 epochs and evaluate validation accuracy
3

Iterate

If underfitting → try unfreezing backbone or larger modelIf overfitting → add regularization, use smaller model, or increase augmentation

Regularization

  • Dropout: 0.5 is standard for dense layers, 0.25-0.3 for conv layers
  • L2 Decay: Use 0.0001-0.001 with AdamW optimizer
  • Data Augmentation: Essential for preventing overfitting

Input Size

  • 224×224: Standard size, works with all pre-trained models
  • 256×256: Use for high-resolution malware visualizations
  • 128×128: Faster training, good for resource-constrained environments
When using transfer learning, input size should match the pre-training size (usually 224×224) for best results.

Troubleshooting

Possible causes:
  • Learning rate too high or too low → Try 1e-3 to 1e-5
  • Backbone frozen but needs fine-tuning → Set freeze_backbone=False
  • Model too simple → Try larger architecture
  • Data quality issues → Check dataset preprocessing
Solutions:
  • Increase dropout (try 0.5-0.7)
  • Add more data augmentation
  • Use smaller model
  • Enable L2 regularization
  • Reduce number of epochs
  • Use early stopping
Solutions:
  • Reduce batch size (try 16 or 8)
  • Use smaller input size (128×128)
  • Switch to lighter model (EfficientNet-B0)
  • Enable gradient checkpointing (advanced)
Solutions:
  • Reduce input size
  • Use fewer data augmentation transforms
  • Increase num_workers in DataLoader
  • Use mixed precision training (advanced)
  • Switch to faster model (ResNet50 instead of ViT)

Next Steps

Hyperparameters

Learn how to tune learning rate, optimizers, and schedulers

Model Evaluation

Understand metrics and evaluate your trained models

Build docs developers (and LLMs) love