Skip to main content

Prerequisites

Before you begin, make sure you have:
  • Python 3.10 or higher
  • pip package manager
  • (Optional) CUDA-enabled GPU for GPU profiling
  • (Optional) PyTorch 1.8+ or TensorFlow 2.4+
GPU Memory Profiler works on CPU-only systems too! It automatically falls back to CPU memory tracking when CUDA isn’t available.

Installation

Install GPU Memory Profiler with your preferred framework support:
pip install gpu-memory-profiler[torch]
For visualization support, add the viz extra:
pip install gpu-memory-profiler[torch,viz]

PyTorch quick start

1

Import the profiler

from gpumemprof import GPUMemoryProfiler
import torch
import torch.nn as nn

# Initialize the profiler
profiler = GPUMemoryProfiler(track_tensors=True)
2

Profile your training step

Create a simple model and profile its training:
# Define a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

model = SimpleModel().cuda()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

# Profile a training step
def train_step(model, data, target):
    optimizer.zero_grad()
    output = model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    return loss.item()

# Create sample data
data = torch.randn(32, 784).cuda()
target = torch.randint(0, 10, (32,)).cuda()

# Profile the function
profile = profiler.profile_function(train_step, model, data, target)
print(f"Function: {profile.function_name}")
3

Use context manager for epochs

Profile entire training loops with context managers:
for epoch in range(3):
    with profiler.profile_context(f"epoch_{epoch+1}"):
        loss = train_step(model, data, target)
        print(f"Epoch {epoch+1} loss: {loss:.4f}")
4

View the summary

Get detailed memory statistics:
summary = profiler.get_summary()
print(f"Peak memory: {summary['peak_memory_usage'] / (1024**3):.2f} GB")
print(f"Average memory: {summary['average_memory_usage'] / (1024**3):.2f} GB")
print(f"Total snapshots: {summary['total_snapshots']}")

TensorFlow quick start

1

Import the profiler

import tensorflow as tf
from tfmemprof import TFMemoryProfiler

# Initialize the profiler
profiler = TFMemoryProfiler(enable_tensor_tracking=True)
2

Create and profile a model

Build a simple model and profile training:
# Define a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Create sample data
x_train = tf.random.normal((1000, 784))
y_train = tf.random.uniform((1000,), minval=0, maxval=10, dtype=tf.int32)
3

Profile with context manager

# Profile the training
with profiler.profile_context("training"):
    model.fit(x_train, y_train, epochs=3, batch_size=32, verbose=0)
    print("Training complete!")
4

View the results

results = profiler.get_results()
print(f"Duration: {results.duration:.3f} seconds")
print(f"Peak memory: {results.peak_memory_mb:.2f} MB")
print(f"Average memory: {results.average_memory_mb:.2f} MB")
print(f"Snapshots captured: {len(results.snapshots)}")

CLI usage

GPU Memory Profiler includes powerful command-line tools for both frameworks.

System information

gpumemprof info

Real-time monitoring

Monitor GPU memory usage in real-time:
# Monitor for 30 seconds with 0.5s interval
gpumemprof monitor --duration 30 --interval 0.5

# Monitor and save to file
gpumemprof monitor --duration 30 --output memory_log.csv

Diagnose issues

Run diagnostics to identify memory problems:
gpumemprof diagnose

Export data

# Export to CSV
gpumemprof monitor --duration 10 --format csv --output metrics.csv

# Export to JSON
gpumemprof monitor --duration 10 --format json --output metrics.json

Interactive terminal UI

Launch the interactive dashboard for real-time monitoring:
# Install TUI dependencies
pip install gpu-memory-profiler[tui]

# Launch the dashboard
gpu-profiler
The TUI provides:
  • Live GPU memory monitoring
  • PyTorch and TensorFlow quick actions
  • Visualizations and charts
  • Export functionality
  • CLI command execution
The terminal UI includes tabs for Overview, PyTorch, TensorFlow, Monitoring, Visualizations, and CLI actions.

Next steps

Core concepts

Learn about profilers, trackers, and context managers

API reference

Explore the complete Python API documentation

Leak detection

Detect and prevent memory leaks in your models

Visualizations

Create timeline plots, heatmaps, and dashboards

Build docs developers (and LLMs) love