Tracker

MemoryTracker

Real-time TensorFlow GPU memory tracker with configurable sampling and alerts.

Constructor

MemoryTracker(
    sampling_interval: float = 1.0,
    alert_threshold_mb: Optional[float] = None,
    device: Optional[str] = None,
    enable_logging: bool = True
)

sampling_interval

float

default:"1.0"

Time between memory samples in seconds

alert_threshold_mb

float

Memory threshold for triggering alerts in MB

device

str

TensorFlow device to monitor (e.g., ‘/GPU:0’). Defaults to ‘/GPU:0’

enable_logging

bool

default:"True"

Whether to log tracking events

Methods

start_tracking

Start real-time memory tracking.

def start_tracking(self) -> None

stop_tracking

Stop tracking and return results.

def stop_tracking(self) -> TrackingResult

TrackingResult

object

Object containing memory usage history, timestamps, events, and alerts

get_current_memory

Get current memory usage without starting tracking.

def get_current_memory(self) -> float

float

Current memory usage in MB

set_alert_threshold

Update the alert threshold during tracking.

def set_alert_threshold(self, threshold_mb: float) -> None

threshold_mb

float

New threshold in MB

add_alert_callback

Add callback function for memory alerts.

def add_alert_callback(self, callback: Callable[[Dict[str, Any]], None]) -> None

callback

Callable

Function to call when alert is triggered. Receives alert dictionary with timestamp, memory_mb, and threshold_mb

Example:

def alert_handler(alert):
    print(f"Alert: {alert['message']}")
    print(f"Memory: {alert['memory_mb']:.2f} MB")

tracker.add_alert_callback(alert_handler)

check_alerts

Check if any alerts have been triggered recently.

def check_alerts(self) -> bool

bool

True if alerts triggered in last 10 seconds

get_tracking_results

Get current tracking results without stopping.

def get_tracking_results(self) -> TrackingResult

TrackingResult

Results from real-time memory tracking.

start_time

float

Start timestamp

end_time

float

End timestamp

memory_usage

List[float]

List of memory samples in MB

timestamps

List[float]

Corresponding timestamps for each sample

events

List[Dict]

Telemetry events captured during tracking

alerts_triggered

List[Dict]

List of triggered alerts

peak_memory

float

Peak memory usage in MB

average_memory

float

Average memory usage in MB

Properties

duration: Total tracking duration in seconds memory_growth_rate: Memory growth rate in MB/second

MemoryWatchdog

Automatic memory management and cleanup for TensorFlow.

Constructor

MemoryWatchdog(
    max_memory_mb: float = 8000,
    cleanup_threshold_mb: float = 6000,
    check_interval: float = 5.0
)

max_memory_mb

float

default:"8000"

Maximum memory before forced cleanup

cleanup_threshold_mb

float

default:"6000"

Memory threshold to trigger cleanup

check_interval

float

default:"5.0"

Time between memory checks in seconds

Methods

start

Start memory watchdog monitoring.

def start(self) -> None

stop

Stop memory watchdog.

def stop(self) -> None

force_cleanup

Force immediate memory cleanup.

def force_cleanup(self) -> None

add_cleanup_callback

Add custom cleanup callback function.

def add_cleanup_callback(self, callback: Callable[[], None]) -> None

callback

Callable

Function to call during cleanup operations

Example

from tfmemprof.tracker import MemoryTracker, MemoryWatchdog
import tensorflow as tf

# Basic tracking
tracker = MemoryTracker(
    sampling_interval=0.5,
    alert_threshold_mb=4000
)

# Add alert callback
def on_alert(alert):
    print(f"Memory alert: {alert['message']}")
    
tracker.add_alert_callback(on_alert)

# Start tracking
tracker.start_tracking()

# Run your code
model = tf.keras.applications.ResNet50()
model.fit(x_train, y_train, epochs=10)

# Stop and get results
results = tracker.stop_tracking()

print(f"Peak memory: {results.peak_memory:.2f} MB")
print(f"Average memory: {results.average_memory:.2f} MB")
print(f"Alerts triggered: {len(results.alerts_triggered)}")

# Memory watchdog for automatic cleanup
watchdog = MemoryWatchdog(
    max_memory_mb=8000,
    cleanup_threshold_mb=6000
)

# Add custom cleanup
def custom_cleanup():
    print("Running custom cleanup")
    # Clear caches, etc.
    
watchdog.add_cleanup_callback(custom_cleanup)
watchdog.start()

# Your training code
# Watchdog automatically cleans up when thresholds exceeded

watchdog.stop()

PyTorch (gpumemprof)

TensorFlow (tfmemprof)

CLI Reference

MemoryTracker

Constructor

Methods

start_tracking

stop_tracking

get_current_memory

set_alert_threshold

add_alert_callback

check_alerts

get_tracking_results

TrackingResult

Properties

MemoryWatchdog

Constructor

Methods

start

stop

force_cleanup

add_cleanup_callback

Example

Build docs developers (and LLMs) love

PyTorch (gpumemprof)

TensorFlow (tfmemprof)

CLI Reference

​MemoryTracker

​Constructor

​Methods

​start_tracking

​stop_tracking

​get_current_memory

​set_alert_threshold

​add_alert_callback

​check_alerts

​get_tracking_results

​TrackingResult

​Properties

​MemoryWatchdog

​Constructor

​Methods

​start

​stop

​force_cleanup

​add_cleanup_callback

​Example

Build docs developers (and LLMs) love

MemoryTracker

Constructor

Methods

start_tracking

stop_tracking

get_current_memory

set_alert_threshold

add_alert_callback

check_alerts

get_tracking_results

TrackingResult

Properties

MemoryWatchdog

Constructor

Methods

start

stop

force_cleanup

add_cleanup_callback

Example