Skip to main content

Overview

The UC Intel Final platform provides three families of neural network architectures, each designed for different use cases and computational constraints:

Custom CNN

Build convolutional neural networks from scratch with configurable layer stacks

Transfer Learning

Fine-tune pre-trained models (VGG, ResNet, EfficientNet) for faster convergence

Vision Transformer

State-of-the-art transformer architecture with self-attention mechanisms

Base Model Interface

All models inherit from the BaseModel abstract class, ensuring consistent interfaces: Location: app/models/base.py:11-71
from abc import ABC, abstractmethod
from typing import Any, Dict, Tuple
import torch.nn as nn

class BaseModel(ABC):
    """Abstract base class for model implementations"""
    
    def __init__(self, config: Dict[str, Any]):
        """
        Initialize model with configuration
        
        Args:
            config: Model configuration dictionary
        """
        self.config = config
        self.model = None
    
    @abstractmethod
    def build(self) -> nn.Module:
        """
        Build and return the model
        
        Returns:
            PyTorch model (nn.Module)
        """
        pass
    
    @abstractmethod
    def get_parameters_count(self) -> Tuple[int, int]:
        """
        Get total and trainable parameter counts
        
        Returns:
            Tuple of (total_params, trainable_params)
        """
        pass
    
    def get_model_summary(self) -> Dict[str, Any]:
        """Get model summary statistics"""
        if self.model is None:
            self.model = self.build()
        
        total_params, trainable_params = self.get_parameters_count()
        
        return {
            "total_parameters": total_params,
            "trainable_parameters": trainable_params,
            "model_type": self.config.get("model_type", "Unknown"),
            "architecture": self.config.get("architecture", "Unknown"),
            "num_classes": self.config.get("num_classes", 0)
        }
All models implement the same interface, making it easy to swap architectures during experimentation.

Custom CNN

Overview

The Custom CNN builder allows you to construct convolutional neural networks from a layer stack configuration. This provides maximum flexibility for architecture experimentation. Location: app/models/pytorch/cnn_builder.py

Architecture

Supported Layer Types

Conv2D - 2D Convolutional layerParameters:
  • filters (int): Number of output channels (default: 32)
  • kernel_size (int): Kernel size (default: 3)
  • activation (str): Activation function - “relu”, “leaky_relu”, “gelu”, “swish” (default: “relu”)
  • padding (str): “same” or “valid” (default: “same”)
Implementation (app/models/pytorch/cnn_builder.py:178-194):
def _build_conv2d(self, in_channels: int, params: dict) -> tuple:
    filters = params.get("filters", 32)
    kernel_size = params.get("kernel_size", 3)
    activation = params.get("activation", "relu")
    padding_mode = params.get("padding", "same")
    
    padding = kernel_size // 2 if padding_mode == "same" else 0
    
    layers = [
        nn.Conv2d(in_channels, filters, 
                 kernel_size=kernel_size, 
                 padding=padding),
        self._get_activation(activation)
    ]
    
    return nn.Sequential(*layers), filters
Output shape: (batch, filters, height, width)

Activation Functions

Location: app/models/pytorch/cnn_builder.py:75-81
ACTIVATION_MAP = {
    "relu": nn.ReLU(inplace=True),
    "leaky_relu": nn.LeakyReLU(0.1, inplace=True),
    "gelu": nn.GELU(),
    "swish": nn.SiLU(inplace=True),
    "none": nn.Identity(),
}

ReLU

Formula: f(x) = max(0, x)Pros:
  • Fast computation
  • Sparse activation
  • Widely used
Cons:
  • Dying ReLU problem

Leaky ReLU

Formula: f(x) = x if x > 0 else 0.1xPros:
  • Fixes dying ReLU
  • Allows negative gradients
Use when: Training deep networks

GELU

Formula: f(x) = x * Φ(x) (Gaussian Error Linear Unit)Pros:
  • Smooth activation
  • Better for transformers
  • State-of-the-art results
Use when: Using transformer-style architectures

Swish (SiLU)

Formula: f(x) = x * sigmoid(x)Pros:
  • Self-gated activation
  • Smooth and non-monotonic
  • Often outperforms ReLU
Use when: Need smooth gradients

Example Configuration

Simple CNN for MNIST-style data:
config = {
    "model_type": "Custom CNN",
    "num_classes": 9,
    "cnn_config": {
        "layers": [
            # Block 1
            {"type": "Conv2D", "params": {"filters": 32, "kernel_size": 3, "activation": "relu"}},
            {"type": "Conv2D", "params": {"filters": 32, "kernel_size": 3, "activation": "relu"}},
            {"type": "MaxPooling2D", "params": {"pool_size": 2}},
            {"type": "BatchNorm"},
            {"type": "Dropout", "params": {"rate": 0.25}},
            
            # Block 2
            {"type": "Conv2D", "params": {"filters": 64, "kernel_size": 3, "activation": "relu"}},
            {"type": "Conv2D", "params": {"filters": 64, "kernel_size": 3, "activation": "relu"}},
            {"type": "MaxPooling2D", "params": {"pool_size": 2}},
            {"type": "BatchNorm"},
            {"type": "Dropout", "params": {"rate": 0.25}},
            
            # Block 3
            {"type": "Conv2D", "params": {"filters": 128, "kernel_size": 3, "activation": "relu"}},
            {"type": "GlobalAvgPool"},
            
            # Classifier
            {"type": "Dense", "params": {"units": 256, "activation": "relu"}},
            {"type": "Dropout", "params": {"rate": 0.5}},
        ]
    }
}
Parameter Count: ~200K parameters

Forward Pass

Location: app/models/pytorch/cnn_builder.py:205-232
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """
    Forward pass
    
    Args:
        x: Input tensor of shape (batch, channels, height, width)
    
    Returns:
        Output logits of shape (batch, num_classes)
    """
    # Apply feature extraction layers
    for layer in self.feature_layers:
        x = layer(x)
    
    # Apply transition (flatten or global pool)
    if self.use_global_pool:
        x = torch.mean(x, dim=[2, 3])
    else:
        x = torch.flatten(x, 1)
    
    # Apply classifier layers
    for layer in self.classifier_layers:
        x = layer(x)
    
    # Output layer
    x = self.output_layer(x)
    
    return x

Transfer Learning

Overview

Transfer learning leverages pre-trained models trained on ImageNet (1.2M images, 1000 classes) to accelerate training and improve performance on smaller datasets. Location: app/models/pytorch/transfer.py

Supported Base Models

VGG16 / VGG19Architecture: Deep CNNs with small 3x3 filtersCharacteristics:
  • 16 or 19 layers
  • Simple, uniform architecture
  • Large number of parameters (~138M for VGG16)
Input size: 224x224Feature dimensions: 512 (after global pooling)Use when: Need simple, well-understood architectureImplementation (app/models/pytorch/transfer.py:152-154):
"VGG16": lambda: models.vgg16(pretrained=use_pretrained),
"VGG19": lambda: models.vgg19(pretrained=use_pretrained),

Fine-Tuning Strategies

Location: app/models/pytorch/transfer.py:194-217
Strategy: Freeze all base model layers, train only classifierImplementation:
# Freeze all base model parameters
for param in self.base_model.parameters():
    param.requires_grad = False
Trainable parameters: ~10K (classifier only)Use when:
  • Small dataset (<1000 images/class)
  • Limited compute resources
  • Domain similar to ImageNet
Training time: Fastest (1-2 hours)Expected performance: Good baseline

Custom Classifier Head

Location: app/models/pytorch/transfer.py:125-146
# Build custom classifier head
classifier_layers = []

if global_pooling:
    self.global_pool = nn.AdaptiveAvgPool2d((1, 1))
else:
    self.global_pool = None

if add_dense:
    # Two-layer classifier
    classifier_layers.extend([
        nn.Linear(in_features, dense_units),
        nn.ReLU(inplace=True),
        nn.Dropout(dropout),
        nn.Linear(dense_units, num_classes)
    ])
else:
    # Single-layer classifier
    classifier_layers.extend([
        nn.Dropout(dropout),
        nn.Linear(in_features, num_classes)
    ])

self.classifier = nn.Sequential(*classifier_layers)
Options:
  • Global Pooling: Reduces spatial dimensions to 1x1
  • Extra Dense Layer: Adds capacity (useful for complex domains)
  • Dropout: Regularization (default: 0.5)

Forward Pass

Location: app/models/pytorch/transfer.py:219-243
def forward(self, x: torch.Tensor) -> torch.Tensor:
    # Extract features with frozen/unfrozen base model
    features = self.base_model(x)
    
    # Apply global pooling if needed
    if self.global_pool is not None and len(features.shape) == 4:
        features = self.global_pool(features)
        features = torch.flatten(features, 1)
    elif len(features.shape) == 4:
        features = torch.flatten(features, 1)
    
    # Apply custom classifier
    output = self.classifier(features)
    
    return output

Vision Transformer

Overview

Vision Transformer (ViT) applies the transformer architecture (originally designed for NLP) to image classification by treating images as sequences of patches. Location: app/models/pytorch/transformer.py Paper: “An Image is Worth 16x16 Words” (Dosovitskiy et al., 2020)

Architecture

Patch Embedding

Location: app/models/pytorch/transformer.py:72-114 Converts 2D image into sequence of patch embeddings:
class PatchEmbedding(nn.Module):
    def __init__(
        self,
        image_size: int = 224,
        patch_size: int = 16,      # 16x16 patches
        in_channels: int = 3,
        embed_dim: int = 768,
    ):
        super().__init__()
        self.image_size = image_size
        self.patch_size = patch_size
        self.num_patches = (image_size // patch_size) ** 2  # 196 for 224x224
        
        # Use convolution to extract and embed patches
        self.proj = nn.Conv2d(
            in_channels, 
            embed_dim, 
            kernel_size=patch_size, 
            stride=patch_size  # Non-overlapping patches
        )
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # (B, C, H, W) -> (B, embed_dim, H/P, W/P)
        x = self.proj(x)
        
        # (B, embed_dim, H/P, W/P) -> (B, num_patches, embed_dim)
        x = x.flatten(2).transpose(1, 2)
        
        return x
Example:
  • Input: (1, 3, 224, 224)
  • After projection: (1, 768, 14, 14)
  • After flatten: (1, 196, 768)

Multi-Head Self-Attention

Location: app/models/pytorch/transformer.py:117-164
class MultiHeadAttention(nn.Module):
    def __init__(self, embed_dim: int, num_heads: int, dropout: float = 0.0):
        super().__init__()
        assert embed_dim % num_heads == 0
        
        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.head_dim = embed_dim // num_heads
        self.scale = self.head_dim ** -0.5
        
        # Single linear layer to compute Q, K, V
        self.qkv = nn.Linear(embed_dim, embed_dim * 3)
        self.attn_drop = nn.Dropout(dropout)
        self.proj = nn.Linear(embed_dim, embed_dim)
        self.proj_drop = nn.Dropout(dropout)
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        B, N, C = x.shape
        
        # Generate Q, K, V
        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, self.head_dim)
        qkv = qkv.permute(2, 0, 3, 1, 4)
        q, k, v = qkv[0], qkv[1], qkv[2]  # Each: (B, num_heads, N, head_dim)
        
        # Scaled dot-product attention
        attn = (q @ k.transpose(-2, -1)) * self.scale
        attn = attn.softmax(dim=-1)
        attn = self.attn_drop(attn)
        
        # Apply attention to values
        x = (attn @ v).transpose(1, 2).reshape(B, N, C)
        
        # Output projection
        x = self.proj(x)
        x = self.proj_drop(x)
        
        return x
Attention Mechanism:
  1. Linear projection to Q, K, V
  2. Split into multiple heads
  3. Compute attention scores: Attention(Q, K, V) = softmax(QK^T / √d_k)V
  4. Concatenate heads
  5. Output projection

Transformer Block

Location: app/models/pytorch/transformer.py:195-220
class TransformerBlock(nn.Module):
    def __init__(
        self,
        embed_dim: int,
        num_heads: int,
        mlp_ratio: float = 4.0,
        dropout: float = 0.0,
    ):
        super().__init__()
        self.norm1 = nn.LayerNorm(embed_dim)
        self.attn = MultiHeadAttention(embed_dim, num_heads, dropout)
        self.norm2 = nn.LayerNorm(embed_dim)
        self.mlp = MLP(
            in_features=embed_dim,
            hidden_features=int(embed_dim * mlp_ratio),  # 3072 for 768-dim
            dropout=dropout
        )
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # Attention with residual (pre-norm)
        x = x + self.attn(self.norm1(x))
        
        # MLP with residual (pre-norm)
        x = x + self.mlp(self.norm2(x))
        
        return x
Structure: LayerNorm → Attention → Residual → LayerNorm → MLP → Residual

Configuration Options

ViT-Base

Configuration:
  • Patch size: 16
  • Embed dim: 768
  • Depth: 12 blocks
  • Heads: 12
  • MLP ratio: 4.0
Parameters: ~86MUse when: Standard accuracy/speed tradeoff

ViT-Large

Configuration:
  • Patch size: 16
  • Embed dim: 1024
  • Depth: 24 blocks
  • Heads: 16
  • MLP ratio: 4.0
Parameters: ~307MUse when: Maximum accuracy, large dataset

ViT-Small

Configuration:
  • Patch size: 16
  • Embed dim: 384
  • Depth: 12 blocks
  • Heads: 6
  • MLP ratio: 4.0
Parameters: ~22MUse when: Limited compute, faster inference

Custom

Configurable parameters:
  • Patch size (8, 16, 32)
  • Embed dimension
  • Number of blocks
  • Number of heads
  • MLP ratio
  • Dropout rate
Use when: Specific requirements

Forward Pass

Location: app/models/pytorch/transformer.py:304-338
def forward(self, x: torch.Tensor) -> torch.Tensor:
    B = x.shape[0]
    
    # 1. Patch embedding
    x = self.patch_embed(x)  # (B, num_patches, embed_dim)
    
    # 2. Add CLS token
    cls_tokens = self.cls_token.expand(B, -1, -1)  # (B, 1, embed_dim)
    x = torch.cat((cls_tokens, x), dim=1)  # (B, num_patches + 1, embed_dim)
    
    # 3. Add position embeddings
    x = x + self.pos_embed
    x = self.pos_drop(x)
    
    # 4. Apply transformer blocks
    for block in self.blocks:
        x = block(x)
    
    # 5. Normalize
    x = self.norm(x)
    
    # 6. Extract CLS token and classify
    cls_output = x[:, 0]  # (B, embed_dim)
    x = self.head(cls_output)  # (B, num_classes)
    
    return x

Model Selection Guide

Small (<1000 images/class):
  • ✅ Transfer Learning (Feature Extraction)
  • ✅ Transfer Learning (Partial Fine-tuning)
  • ⚠️ Custom CNN (risk of overfitting)
  • ❌ Vision Transformer (requires large dataset)
Medium (1000-5000 images/class):
  • ✅ Transfer Learning (Partial/Full Fine-tuning)
  • ✅ Custom CNN (with regularization)
  • ⚠️ Vision Transformer (may underperform)
Large (>5000 images/class):
  • ✅ All architectures
  • ✅ Vision Transformer (best performance)
  • ✅ Transfer Learning (Full Fine-tuning)
  • ✅ Custom CNN (deep architectures)

Performance Comparison

Typical Results on Malware Dataset

ArchitectureParametersTraining TimeAccuracyGPU Memory
Custom CNN (Small)~200K1-2 hours85-88%2 GB
Custom CNN (Deep)~2M3-4 hours88-91%4 GB
ResNet50 (Feature Ext.)~25M1-2 hours90-93%4 GB
ResNet50 (Partial FT)~25M3-5 hours92-95%6 GB
ResNet50 (Full FT)~25M6-10 hours93-96%8 GB
EfficientNetB0~5M2-4 hours91-94%3 GB
ViT-Small~22M8-12 hours90-93%8 GB
ViT-Base~86M12-24 hours94-97%16 GB
Results vary based on dataset size, quality, and training configuration. These are representative ranges.

References

  • Custom CNN implementation: app/models/pytorch/cnn_builder.py
  • Transfer learning implementation: app/models/pytorch/transfer.py
  • Vision Transformer implementation: app/models/pytorch/transformer.py
  • Base model interface: app/models/base.py
  • Model building in training worker: app/training/worker.py:29-42

Build docs developers (and LLMs) love