Architecture Overview
DeepLabv3+ is a state-of-the-art semantic segmentation architecture that combines atrous (dilated) convolutions with an encoder-decoder structure for accurate segmentation.Key Features
- Backbone: Xception with modified stride
- ASPP: Atrous Spatial Pyramid Pooling for multi-scale context
- Output stride: OS=16 for balance between accuracy and efficiency
- Atrous rates: [6, 12, 18] for multi-scale features
- Pretrained: Pascal VOC dataset (84.56% mIOU)
- Output: 2-class segmentation with softmax
Atrous Spatial Pyramid Pooling (ASPP)
ASPP captures multi-scale context by applying parallel atrous convolutions with different dilation rates:models/deeplabv3.py (371-413)
ASPP Components
- 5 Parallel Branches
- Receptive Fields
Image-level features (b4):
- Global Average Pooling
- 1×1 Conv → 256 filters
- Bilinear upsample to original size
- Captures point-wise features
- 256 filters
- Separable conv with dilation=6
- 256 filters
- Separable conv with dilation=12
- 256 filters
- Separable conv with dilation=18
- 256 filters
Xception Backbone
Modified Xception architecture serves as the encoder:models/deeplabv3.py (273-314)
Xception Structure
| Flow | Blocks | Output Stride | Atrous Rate | Filters |
|---|---|---|---|---|
| Entry | 3 | 2, 2, 2 (OS=8) or 1 (OS=16) | 1 | 128→256→728 |
| Middle | 16 | 1 | 1 (OS=16) or 2 (OS=8) | 728 |
| Exit | 2 | 1 | 1,2 (OS=16) or 2,4 (OS=8) | 1024→2048 |
Depthwise Separable Convolution
Core building block for efficient atrous convolutions:models/deeplabv3.py (52-89)
- Parameter reduction: ~9× fewer parameters than standard conv
- Atrous support: Efficient dilation for multi-scale features
- Computational efficiency: Separates spatial and channel-wise convolutions
Decoder Architecture
Decoder refines segmentation with skip connections:models/deeplabv3.py (416-433)
Complete Model Function
models/deeplabv3.py (219-220)
Parameters
| Parameter | Options | Default | Description |
|---|---|---|---|
weights | 'pascal_voc', 'cityscapes', None | 'came' | Pretrained weights |
input_shape | (H, W, 3) | (512, 512, 3) | Input image size |
classes | Integer | 21 | Number of output classes |
backbone | 'xception', 'mobilenetv2' | 'xception' | Encoder architecture |
OS | 8, 16 | 16 | Output stride |
activation | 'softmax', 'sigmoid', None | None | Final activation |
Output Stride (OS) Configuration
- OS=16 (Default)
- OS=8
Configuration:
- Entry block 3 stride: 2
- Middle block rate: 1
- Exit block rates: (1, 2)
- Atrous rates: (6, 12, 18)
- Faster inference
- Lower memory usage
- Good accuracy/speed tradeoff
Pretrained Weights
Pascal VOC Performance
Original DeepLabv3+ achieves 84.56% mIOU on Pascal VOC validation set.Available Weights
models/deeplabv3.py (46-49)
DigiPathAI Weights
- digestpath_deeplabv3.h5: Fine-tuned for DigestPath
- paip_deeplabv3.h5: Fine-tuned for PAIP
- camelyon_deeplabv3.h5: Fine-tuned for Camelyon
Input/Output Specifications
Input
- Shape:
(batch, height, width, 3) - Default size: 512×512 pixels
- Preprocessing: TensorFlow mode normalization
- Range: [0, 255] RGB values
Output
- Shape:
(batch, height, width, classes) - Classes: 2 for DigiPathAI (background, tissue)
- Activation: Softmax probabilities
DeepLabv3+ excels at capturing multi-scale context through ASPP while maintaining precise boundaries through its encoder-decoder structure with skip connections.
Usage Example
Advantages for Pathology
Multi-Scale Context
- ASPP: Captures features at 5 different scales
- Global pooling: Incorporates entire image context
- Atrous convolutions: Large receptive fields without downsampling
Computational Efficiency
- Depthwise separable: Reduces parameters and computation
- OS=16: Balances accuracy and speed
- Shared encoder: Efficient multi-scale processing
Accurate Boundaries
- Decoder refinement: 4× upsampling with skip connections
- Low-level features: Preserves spatial details
- Bilinear upsampling: Smooth boundary reconstruction
Related Models
DenseNet U-Net
Dense connectivity for feature reuse
Inception-ResNet
Multi-scale Inception blocks