Overview
Pooling operations in the GraphNeuralNetwork encoder reduce node-level representations to fixed-size graph-level embeddings. The STGNN architecture supports two pooling strategies:- TopK Pooling: Hierarchical graph coarsening that selectively retains the most important nodes
- Global Pooling: Direct aggregation of all node features using mean and max operations
output_dim × 2) but differ in computational approach and learned representations.
Global Pooling
Overview
Global pooling aggregates information from all nodes in the graph using permutation-invariant operations:Implementation
The encoder combines two global pooling operations:Mean Pooling
Averages node features across the entire graph:- Captures average node properties
- Robust to outliers
- Smooth representations
- Equal weight to all nodes
Max Pooling
Takes element-wise maximum across node features:- Captures salient features
- Emphasizes extreme values
- Sparse activation patterns
- Focuses on discriminative nodes
Concatenation Strategy
Combining mean and max pooling provides complementary information:- Richer representations: Captures both average and extreme properties
- Improved performance: Empirically shown to outperform single pooling methods
- Minimal overhead: No additional parameters, just concatenation
Advantages of Global Pooling
- No additional parameters: Uses all nodes without learned selection
- Computationally efficient: Simple aggregation operations
- Stable training: No potential for empty graphs
- Interpretable: Clear semantic meaning (average and maximum)
- Memory efficient: No intermediate graph storage
Disadvantages
- Fixed receptive field: Cannot focus on important subgraphs
- Noise sensitivity: Includes all nodes, even noisy ones
- Limited hierarchical structure: Flat aggregation, no coarsening
- Uniform weighting: All nodes contribute equally (in mean pooling)
TopK Pooling
Overview
TopK pooling hierarchically coarsens graphs by iteratively selecting the most important nodes:Implementation
TopK pooling is applied after each graph convolution layer:How TopK Pooling Works
-
Score Computation: Each node receives a score based on its features
-
Node Selection: Keep top
knodes with highest scores -
Graph Coarsening: Retain selected nodes and their connecting edges
-
Batch Update: Update batch assignments for remaining nodes
Safe Ratio Enforcement
The implementation includes safeguards to prevent empty graphs:- Minimum 30% retention ensures at least some nodes survive deep architectures
- Prevents empty graphs that would cause forward pass failures
- Balances selectivity with stability
Layer-wise Pooling
TopK pooling creates a hierarchy of progressively coarsened graphs:Advantages of TopK Pooling
- Hierarchical structure: Multi-scale graph representations
- Learned selection: Adapts to identify important nodes for the task
- Noise reduction: Filters out less relevant nodes
- Improved expressiveness: Can focus on discriminative subgraphs
- Attention-like mechanism: Weights nodes by importance
Disadvantages
- Additional parameters: Each TopK layer has learnable projection weights
- Computational overhead: Score computation and graph filtering
- Training instability: Risk of empty graphs without safeguards
- Discrete operation: Non-differentiable node selection (straight-through gradients)
- Memory overhead: Stores intermediate graphs
Comparison
Performance
| Metric | Global Pooling | TopK Pooling |
|---|---|---|
| Parameters | 0 additional | num_layers × output_dim |
| Computation | O(N) | O(N log N) per layer |
| Memory | Low | Medium |
| Training stability | High | Medium (with safeguards) |
| Expressiveness | Good | Excellent |
When to Use Global Pooling
- Small graphs: When nodes < 50, selection overhead dominates
- Limited data: Fewer parameters reduce overfitting risk
- Uniform importance: When all nodes contribute equally
- Baseline models: Quick prototyping and experimentation
- Interpretability: Simple, well-understood aggregation
When to Use TopK Pooling
- Large graphs: When nodes > 100, can focus on important regions
- Noisy data: Filter out irrelevant or low-quality nodes
- Hierarchical structure: Exploit multi-scale patterns
- Performance critical: When model capacity is more important than speed
- Sufficient data: Large datasets can support additional parameters
Configuration Examples
Conservative TopK (High Retention)
Aggressive TopK (Low Retention)
Balanced Configuration
Global Pooling (No Coarsening)
Implementation Details
Forward Pass with TopK
Frommodel.py lines 136-155:
Forward Pass without TopK
Frommodel.py lines 157-171:
Experimental Recommendations
Hyperparameter Search
Ablation Study
Visualization
Node Retention Across Layers
See Also
- GNN Encoder - Full GraphNeuralNetwork implementation
- Architecture - Complete STGNN pipeline
- Training Configuration - Hyperparameter tuning guidelines