Overview
GPU Memory Profiler is designed with a modular, extensible architecture that supports both PyTorch and TensorFlow while maintaining clean separation of concerns.High-level architecture
Core components
Profiler
The main profiling engine that coordinates memory monitoring and data collection. Responsibilities:- Initialize profiling sessions
- Coordinate data collection from framework layers
- Manage profiling state and configuration
- Provide high-level API for users
GPUMemoryProfiler(PyTorch -gpumemprof.profiler)TFMemoryProfiler(TensorFlow -tfmemprof.profiler)
profiler.py in the respective package.
Tracker
Real-time memory tracking with background monitoring capabilities. Responsibilities:- Continuous memory monitoring
- Alert system for memory thresholds
- Background data collection
- Memory leak detection
MemoryTracker(exported from both packages)TrackingEvent(gpumemprof) /TrackingResult(tfmemprof)MemoryWatchdog(internal - not re-exported from package__init__)
tracker.py in the respective package.
Visualizer
Data visualization and reporting capabilities. Responsibilities:- Generate memory timeline plots
- Create heatmaps and charts
- Interactive dashboards
- Export visualizations
MemoryVisualizer(requires[viz]extra; uses matplotlib, seaborn, plotly internally)
visualizer.py in the respective package.
Analyzer
Advanced analysis and optimization recommendations. Responsibilities:- Memory leak detection algorithms
- Performance analysis
- Optimization suggestions
- Pattern recognition
MemoryAnalyzerGapFinding(hidden-memory gap analysis)
analyzer.py in the respective package.
Context profiler
Context-aware profiling with decorators and context managers. Responsibilities:- Function-level profiling
- Context manager support
- Decorator implementations
- Scope-based memory tracking
profile_function(decorator)profile_context(context manager)MemoryProfiler/ProfiledModule(gpumemprof)TensorFlowProfiler/ProfiledLayer(tfmemprof)
context_profiler.py in the respective package.
Utils
Utility functions and system information gathering. Responsibilities:- System information collection
- Memory formatting
- Framework detection
- Error handling
get_gpu_info()(gpumemprof) /get_system_info()(tfmemprof)format_bytes(),convert_bytes()detect_torch_runtime_backend()(gpumemprof)
utils.py in the respective package.
CLI
Command-line interface for standalone usage. Responsibilities:- Command-line argument parsing
- Real-time monitoring interface
- Data export and analysis
- System information display
info- System informationmonitor- Real-time monitoringtrack- Background trackinganalyze- Results analysisdiagnose- Diagnostic bundle generation
cli.py in the respective package.
OOM flight recorder
Captures memory state before out-of-memory crashes for post-mortem analysis. Key classes:OOMFlightRecorderOOMFlightRecorderConfigOOMExceptionClassification
oom_flight_recorder.py in gpumemprof.
Device collectors
Backend-aware device memory sampling across CUDA, ROCm, and MPS. Key classes:DeviceMemoryCollector(abstract base)CudaDeviceCollector,ROCmDeviceCollector,MPSDeviceCollectorDeviceMemorySample
device_collectors.py in gpumemprof.
Telemetry
Structured telemetry event schema for profiling data interchange. Key classes:TelemetryEventV2
telemetry.py in gpumemprof and the telemetry schema documentation.
Framework-specific architecture
PyTorch profiler
- Tensor lifecycle tracking
- CUDA memory management integration
- PyTorch-specific optimizations
- Autograd memory profiling
TensorFlow profiler
- Session-based memory tracking
- Graph execution monitoring
- Keras model profiling
- Mixed precision support
Data flow
Initialization flow
Profiling flow
Monitoring flow
Analysis flow
Design principles
Modularity
Each component has a single responsibility and can be used independently:Extensibility
The architecture supports easy extension through the device-collector abstraction:Thread safety
All components are designed to be thread-safe for concurrent usage:Performance
Minimal overhead design with configurable sampling:Configuration management
Configuration is handled through constructor arguments and CLI flags. There is no external configuration file or environment variable interface at this time.Error handling
Graceful degradation
Testing architecture
Test structure
Tests live in a flattests/ directory with framework-specific prefixes:
pyproject.toml): unit, integration, slow, tui_pilot, tui_pty, tui_snapshot.
Mock strategy
Future extensibility
Plugin system
Custom visualizations
Framework support
New frameworks can implement aDeviceMemoryCollector and integrate with the existing profiling pipeline.