Profiler

The profiler module provides comprehensive GPU memory profiling capabilities for PyTorch operations.

Classes

MemorySnapshot

Represents a memory snapshot at a specific point in time.

from gpumemprof import MemorySnapshot

Attributes

timestamp

float

Unix timestamp when the snapshot was taken

allocated_memory

int

Allocated GPU memory in bytes

reserved_memory

int

Reserved GPU memory in bytes

max_memory_allocated

int

Maximum allocated memory since last reset

max_memory_reserved

int

Maximum reserved memory since last reset

active_memory

int

Active memory in the allocator

inactive_memory

int

Inactive memory in the allocator

cpu_memory

int

CPU memory usage in bytes

device_id

int

default:"0"

GPU device ID

operation

Optional[str]

default:"None"

Name of the operation when snapshot was taken

stack_trace

Optional[str]

default:"None"

Stack trace at snapshot time (if enabled)

Methods

to_dict()

Convert the snapshot to a dictionary.

snapshot = MemorySnapshot(...)
data = snapshot.to_dict()

return

Dict[str, Any]

Dictionary representation of the snapshot

ProfileResult

Results from profiling a function or operation.

from gpumemprof import ProfileResult

Attributes

function_name

str

Name of the profiled function

execution_time

float

Execution time in seconds

memory_before

MemorySnapshot

Memory snapshot before execution

memory_after

MemorySnapshot

Memory snapshot after execution

memory_peak

MemorySnapshot

Peak memory snapshot during execution

memory_allocated

int

Total memory allocated during execution

memory_freed

int

Total memory freed during execution

tensors_created

int

Number of tensors created

tensors_deleted

int

Number of tensors deleted

call_count

int

default:"1"

Number of times the function was called

Methods

memory_diff()

Calculate memory difference between before and after.

result = profiler.profile_function(my_function)
memory_change = result.memory_diff()

return

int

Net memory change in bytes

peak_memory_usage()

Get peak memory usage during execution.

peak = result.peak_memory_usage()

return

int

Peak allocated memory in bytes

to_dict()

Convert result to dictionary.

data = result.to_dict()

return

Dict[str, Any]

Dictionary representation including all metrics

GPUMemoryProfiler

Comprehensive GPU memory profiler for PyTorch operations.

from gpumemprof import GPUMemoryProfiler

profiler = GPUMemoryProfiler(device="cuda:0")

Constructor

device

Union[str, int, torch.device, None]

default:"None"

GPU device to profile. If None, auto-detects the current CUDA device

track_tensors

bool

default:"True"

Whether to track tensor creation and deletion

track_cpu_memory

bool

default:"True"

Whether to track CPU memory usage alongside GPU

collect_stack_traces

bool

default:"False"

Whether to collect stack traces for operations (impacts performance)

Methods

profile_function()

Profile a single function call.

def train_step(model, batch):
    output = model(batch)
    return output

result = profiler.profile_function(train_step, model, batch)
print(f"Memory allocated: {result.memory_allocated / 1024**3:.2f} GB")

func

Callable

Function to profile

*args

Any

Arguments to pass to the function

**kwargs

Any

Keyword arguments to pass to the function

return

ProfileResult

Profiling results including memory and timing information

profile_context()

Context manager for profiling a block of code.

with profiler.profile_context("training_loop"):
    for batch in dataloader:
        output = model(batch)
        loss = criterion(output, labels)
        loss.backward()

name

str

default:"'context'"

Name for the profiled context

start_monitoring()

Start continuous memory monitoring in background.

profiler.start_monitoring(interval=0.5)
# ... run your code ...
profiler.stop_monitoring()

interval

float

default:"0.1"

Monitoring interval in seconds

stop_monitoring()

Stop continuous memory monitoring.

profiler.stop_monitoring()

get_summary()

Get a comprehensive summary of all profiling results.

summary = profiler.get_summary()
print(f"Peak memory: {summary['peak_memory_usage']}")
print(f"Total functions profiled: {summary['total_functions_profiled']}")

return

Dict[str, Any]

Summary statistics including:

device: Device being profiled
total_functions_profiled: Number of unique functions
total_function_calls: Total profiling operations
peak_memory_usage: Peak memory usage across all operations
current_memory_usage: Current memory state
function_summaries: Per-function statistics

clear_results()

Clear all profiling results and reset state.

profiler.clear_results()

Context Manager Support

The profiler can be used as a context manager:

with GPUMemoryProfiler(device="cuda:0") as profiler:
    profiler.start_monitoring()
    # ... your code ...
    summary = profiler.get_summary()

TensorTracker

Tracks tensor creation and deletion for memory profiling.

from gpumemprof.profiler import TensorTracker

tracker = TensorTracker()

Methods

count_tensors()

Count current number of tracked tensors.

count = tracker.count_tensors()

return

int

Number of CUDA tensors currently in memory

Example Usage

import torch
from gpumemprof import GPUMemoryProfiler

# Initialize profiler
profiler = GPUMemoryProfiler(device="cuda:0", track_tensors=True)

# Profile a function
def allocate_tensors(size):
    return torch.randn(size, size, device="cuda")

result = profiler.profile_function(allocate_tensors, 1000)
print(f"Execution time: {result.execution_time:.4f}s")
print(f"Memory allocated: {result.memory_allocated / 1024**2:.2f} MB")
print(f"Tensors created: {result.tensors_created}")

# Profile a context
with profiler.profile_context("matrix_operations"):
    a = torch.randn(1000, 1000, device="cuda")
    b = torch.randn(1000, 1000, device="cuda")
    c = torch.matmul(a, b)

# Get summary
summary = profiler.get_summary()
for func_name, stats in summary['function_summaries'].items():
    print(f"{func_name}: {stats['avg_memory_allocated'] / 1024**2:.2f} MB average")

PyTorch (gpumemprof)

TensorFlow (tfmemprof)

CLI Reference

Classes

MemorySnapshot

Attributes

Methods

to_dict()

ProfileResult

Attributes

Methods

memory_diff()

peak_memory_usage()

to_dict()

GPUMemoryProfiler

Constructor

Methods

profile_function()

profile_context()

start_monitoring()

stop_monitoring()

get_summary()

clear_results()

Context Manager Support

TensorTracker

Methods

count_tensors()

Example Usage

Build docs developers (and LLMs) love

PyTorch (gpumemprof)

TensorFlow (tfmemprof)

CLI Reference

​Classes

​MemorySnapshot

​Attributes

​Methods

to_dict()

​ProfileResult

​Attributes

​Methods

memory_diff()

peak_memory_usage()

to_dict()

​GPUMemoryProfiler

​Constructor

​Methods

profile_function()

profile_context()

start_monitoring()

stop_monitoring()

get_summary()

clear_results()

​Context Manager Support

​TensorTracker

​Methods

count_tensors()

​Example Usage

Build docs developers (and LLMs) love

Classes

MemorySnapshot

Attributes

Methods

ProfileResult

Attributes

Methods

GPUMemoryProfiler

Constructor

Methods

Context Manager Support

TensorTracker

Methods

Example Usage