profiler module provides comprehensive GPU memory profiling capabilities for PyTorch operations.
Classes
MemorySnapshot
Represents a memory snapshot at a specific point in time.Attributes
Unix timestamp when the snapshot was taken
Allocated GPU memory in bytes
Reserved GPU memory in bytes
Maximum allocated memory since last reset
Maximum reserved memory since last reset
Active memory in the allocator
Inactive memory in the allocator
CPU memory usage in bytes
GPU device ID
Name of the operation when snapshot was taken
Stack trace at snapshot time (if enabled)
Methods
to_dict()
Convert the snapshot to a dictionary.Dictionary representation of the snapshot
ProfileResult
Results from profiling a function or operation.Attributes
Name of the profiled function
Execution time in seconds
Memory snapshot before execution
Memory snapshot after execution
Peak memory snapshot during execution
Total memory allocated during execution
Total memory freed during execution
Number of tensors created
Number of tensors deleted
Number of times the function was called
Methods
memory_diff()
Calculate memory difference between before and after.Net memory change in bytes
peak_memory_usage()
Get peak memory usage during execution.Peak allocated memory in bytes
to_dict()
Convert result to dictionary.Dictionary representation including all metrics
GPUMemoryProfiler
Comprehensive GPU memory profiler for PyTorch operations.Constructor
GPU device to profile. If None, auto-detects the current CUDA device
Whether to track tensor creation and deletion
Whether to track CPU memory usage alongside GPU
Whether to collect stack traces for operations (impacts performance)
Methods
profile_function()
Profile a single function call.Function to profile
Arguments to pass to the function
Keyword arguments to pass to the function
Profiling results including memory and timing information
profile_context()
Context manager for profiling a block of code.Name for the profiled context
start_monitoring()
Start continuous memory monitoring in background.Monitoring interval in seconds
stop_monitoring()
Stop continuous memory monitoring.get_summary()
Get a comprehensive summary of all profiling results.Summary statistics including:
device: Device being profiledtotal_functions_profiled: Number of unique functionstotal_function_calls: Total profiling operationspeak_memory_usage: Peak memory usage across all operationscurrent_memory_usage: Current memory statefunction_summaries: Per-function statistics
clear_results()
Clear all profiling results and reset state.Context Manager Support
The profiler can be used as a context manager:TensorTracker
Tracks tensor creation and deletion for memory profiling.Methods
count_tensors()
Count current number of tracked tensors.Number of CUDA tensors currently in memory