Prerequisites
Before you begin, make sure you have:
Python 3.10 or higher
pip package manager
(Optional) CUDA-enabled GPU for GPU profiling
(Optional) PyTorch 1.8+ or TensorFlow 2.4+
GPU Memory Profiler works on CPU-only systems too! It automatically falls back to CPU memory tracking when CUDA isn’t available.
Installation
Install GPU Memory Profiler with your preferred framework support:
PyTorch
TensorFlow
Both frameworks
Basic (no framework)
pip install gpu-memory-profiler[torch]
For visualization support, add the viz extra:
pip install gpu-memory-profiler[torch,viz]
PyTorch quick start
Import the profiler
from gpumemprof import GPUMemoryProfiler
import torch
import torch.nn as nn
# Initialize the profiler
profiler = GPUMemoryProfiler( track_tensors = True )
Profile your training step
Create a simple model and profile its training: # Define a simple model
class SimpleModel ( nn . Module ):
def __init__ ( self ):
super (). __init__ ()
self .fc1 = nn.Linear( 784 , 128 )
self .fc2 = nn.Linear( 128 , 10 )
def forward ( self , x ):
x = torch.relu( self .fc1(x))
return self .fc2(x)
model = SimpleModel().cuda()
optimizer = torch.optim.Adam(model.parameters(), lr = 1e-3 )
criterion = nn.CrossEntropyLoss()
# Profile a training step
def train_step ( model , data , target ):
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
return loss.item()
# Create sample data
data = torch.randn( 32 , 784 ).cuda()
target = torch.randint( 0 , 10 , ( 32 ,)).cuda()
# Profile the function
profile = profiler.profile_function(train_step, model, data, target)
print ( f "Function: { profile.function_name } " )
Use context manager for epochs
Profile entire training loops with context managers: for epoch in range ( 3 ):
with profiler.profile_context( f "epoch_ { epoch + 1 } " ):
loss = train_step(model, data, target)
print ( f "Epoch { epoch + 1 } loss: { loss :.4f} " )
View the summary
Get detailed memory statistics: summary = profiler.get_summary()
print ( f "Peak memory: { summary[ 'peak_memory_usage' ] / ( 1024 ** 3 ) :.2f} GB" )
print ( f "Average memory: { summary[ 'average_memory_usage' ] / ( 1024 ** 3 ) :.2f} GB" )
print ( f "Total snapshots: { summary[ 'total_snapshots' ] } " )
TensorFlow quick start
Import the profiler
import tensorflow as tf
from tfmemprof import TFMemoryProfiler
# Initialize the profiler
profiler = TFMemoryProfiler( enable_tensor_tracking = True )
Create and profile a model
Build a simple model and profile training: # Define a simple model
model = tf.keras.Sequential([
tf.keras.layers.Dense( 128 , activation = 'relu' , input_shape = ( 784 ,)),
tf.keras.layers.Dense( 10 , activation = 'softmax' )
])
model.compile(
optimizer = 'adam' ,
loss = 'sparse_categorical_crossentropy' ,
metrics = [ 'accuracy' ]
)
# Create sample data
x_train = tf.random.normal(( 1000 , 784 ))
y_train = tf.random.uniform(( 1000 ,), minval = 0 , maxval = 10 , dtype = tf.int32)
Profile with context manager
# Profile the training
with profiler.profile_context( "training" ):
model.fit(x_train, y_train, epochs = 3 , batch_size = 32 , verbose = 0 )
print ( "Training complete!" )
View the results
results = profiler.get_results()
print ( f "Duration: { results.duration :.3f} seconds" )
print ( f "Peak memory: { results.peak_memory_mb :.2f} MB" )
print ( f "Average memory: { results.average_memory_mb :.2f} MB" )
print ( f "Snapshots captured: { len (results.snapshots) } " )
CLI usage
GPU Memory Profiler includes powerful command-line tools for both frameworks.
PyTorch CLI
TensorFlow CLI
Real-time monitoring Monitor GPU memory usage in real-time: # Monitor for 30 seconds with 0.5s interval
gpumemprof monitor --duration 30 --interval 0.5
# Monitor and save to file
gpumemprof monitor --duration 30 --output memory_log.csv
Diagnose issues Run diagnostics to identify memory problems: Export data # Export to CSV
gpumemprof monitor --duration 10 --format csv --output metrics.csv
# Export to JSON
gpumemprof monitor --duration 10 --format json --output metrics.json
Monitor training # Monitor with custom interval
tfmemprof monitor --duration 30 --interval 1.0
# Save monitoring data
tfmemprof monitor --output tf_memory.csv
Profile a script tfmemprof profile my_training_script.py
Interactive terminal UI
Launch the interactive dashboard for real-time monitoring:
# Install TUI dependencies
pip install gpu-memory-profiler[tui]
# Launch the dashboard
gpu-profiler
The TUI provides:
Live GPU memory monitoring
PyTorch and TensorFlow quick actions
Visualizations and charts
Export functionality
CLI command execution
The terminal UI includes tabs for Overview, PyTorch, TensorFlow, Monitoring, Visualizations, and CLI actions.
Next steps
Core concepts Learn about profilers, trackers, and context managers
API reference Explore the complete Python API documentation
Leak detection Detect and prevent memory leaks in your models
Visualizations Create timeline plots, heatmaps, and dashboards