Skip to main content
The profiler module provides comprehensive GPU memory profiling capabilities for PyTorch operations.

Classes

MemorySnapshot

Represents a memory snapshot at a specific point in time.
from gpumemprof import MemorySnapshot

Attributes

timestamp
float
Unix timestamp when the snapshot was taken
allocated_memory
int
Allocated GPU memory in bytes
reserved_memory
int
Reserved GPU memory in bytes
max_memory_allocated
int
Maximum allocated memory since last reset
max_memory_reserved
int
Maximum reserved memory since last reset
active_memory
int
Active memory in the allocator
inactive_memory
int
Inactive memory in the allocator
cpu_memory
int
CPU memory usage in bytes
device_id
int
default:"0"
GPU device ID
operation
Optional[str]
default:"None"
Name of the operation when snapshot was taken
stack_trace
Optional[str]
default:"None"
Stack trace at snapshot time (if enabled)

Methods

to_dict()
Convert the snapshot to a dictionary.
snapshot = MemorySnapshot(...)
data = snapshot.to_dict()
return
Dict[str, Any]
Dictionary representation of the snapshot

ProfileResult

Results from profiling a function or operation.
from gpumemprof import ProfileResult

Attributes

function_name
str
Name of the profiled function
execution_time
float
Execution time in seconds
memory_before
MemorySnapshot
Memory snapshot before execution
memory_after
MemorySnapshot
Memory snapshot after execution
memory_peak
MemorySnapshot
Peak memory snapshot during execution
memory_allocated
int
Total memory allocated during execution
memory_freed
int
Total memory freed during execution
tensors_created
int
Number of tensors created
tensors_deleted
int
Number of tensors deleted
call_count
int
default:"1"
Number of times the function was called

Methods

memory_diff()
Calculate memory difference between before and after.
result = profiler.profile_function(my_function)
memory_change = result.memory_diff()
return
int
Net memory change in bytes
peak_memory_usage()
Get peak memory usage during execution.
peak = result.peak_memory_usage()
return
int
Peak allocated memory in bytes
to_dict()
Convert result to dictionary.
data = result.to_dict()
return
Dict[str, Any]
Dictionary representation including all metrics

GPUMemoryProfiler

Comprehensive GPU memory profiler for PyTorch operations.
from gpumemprof import GPUMemoryProfiler

profiler = GPUMemoryProfiler(device="cuda:0")

Constructor

device
Union[str, int, torch.device, None]
default:"None"
GPU device to profile. If None, auto-detects the current CUDA device
track_tensors
bool
default:"True"
Whether to track tensor creation and deletion
track_cpu_memory
bool
default:"True"
Whether to track CPU memory usage alongside GPU
collect_stack_traces
bool
default:"False"
Whether to collect stack traces for operations (impacts performance)

Methods

profile_function()
Profile a single function call.
def train_step(model, batch):
    output = model(batch)
    return output

result = profiler.profile_function(train_step, model, batch)
print(f"Memory allocated: {result.memory_allocated / 1024**3:.2f} GB")
func
Callable
Function to profile
*args
Any
Arguments to pass to the function
**kwargs
Any
Keyword arguments to pass to the function
return
ProfileResult
Profiling results including memory and timing information
profile_context()
Context manager for profiling a block of code.
with profiler.profile_context("training_loop"):
    for batch in dataloader:
        output = model(batch)
        loss = criterion(output, labels)
        loss.backward()
name
str
default:"'context'"
Name for the profiled context
start_monitoring()
Start continuous memory monitoring in background.
profiler.start_monitoring(interval=0.5)
# ... run your code ...
profiler.stop_monitoring()
interval
float
default:"0.1"
Monitoring interval in seconds
stop_monitoring()
Stop continuous memory monitoring.
profiler.stop_monitoring()
get_summary()
Get a comprehensive summary of all profiling results.
summary = profiler.get_summary()
print(f"Peak memory: {summary['peak_memory_usage']}")
print(f"Total functions profiled: {summary['total_functions_profiled']}")
return
Dict[str, Any]
Summary statistics including:
  • device: Device being profiled
  • total_functions_profiled: Number of unique functions
  • total_function_calls: Total profiling operations
  • peak_memory_usage: Peak memory usage across all operations
  • current_memory_usage: Current memory state
  • function_summaries: Per-function statistics
clear_results()
Clear all profiling results and reset state.
profiler.clear_results()

Context Manager Support

The profiler can be used as a context manager:
with GPUMemoryProfiler(device="cuda:0") as profiler:
    profiler.start_monitoring()
    # ... your code ...
    summary = profiler.get_summary()

TensorTracker

Tracks tensor creation and deletion for memory profiling.
from gpumemprof.profiler import TensorTracker

tracker = TensorTracker()

Methods

count_tensors()
Count current number of tracked tensors.
count = tracker.count_tensors()
return
int
Number of CUDA tensors currently in memory

Example Usage

import torch
from gpumemprof import GPUMemoryProfiler

# Initialize profiler
profiler = GPUMemoryProfiler(device="cuda:0", track_tensors=True)

# Profile a function
def allocate_tensors(size):
    return torch.randn(size, size, device="cuda")

result = profiler.profile_function(allocate_tensors, 1000)
print(f"Execution time: {result.execution_time:.4f}s")
print(f"Memory allocated: {result.memory_allocated / 1024**2:.2f} MB")
print(f"Tensors created: {result.tensors_created}")

# Profile a context
with profiler.profile_context("matrix_operations"):
    a = torch.randn(1000, 1000, device="cuda")
    b = torch.randn(1000, 1000, device="cuda")
    c = torch.matmul(a, b)

# Get summary
summary = profiler.get_summary()
for func_name, stats in summary['function_summaries'].items():
    print(f"{func_name}: {stats['avg_memory_allocated'] / 1024**2:.2f} MB average")

Build docs developers (and LLMs) love