Skip to main content

MemoryTracker

Real-time TensorFlow GPU memory tracker with configurable sampling and alerts.

Constructor

MemoryTracker(
    sampling_interval: float = 1.0,
    alert_threshold_mb: Optional[float] = None,
    device: Optional[str] = None,
    enable_logging: bool = True
)
sampling_interval
float
default:"1.0"
Time between memory samples in seconds
alert_threshold_mb
float
Memory threshold for triggering alerts in MB
device
str
TensorFlow device to monitor (e.g., ‘/GPU:0’). Defaults to ‘/GPU:0’
enable_logging
bool
default:"True"
Whether to log tracking events

Methods

start_tracking

Start real-time memory tracking.
def start_tracking(self) -> None

stop_tracking

Stop tracking and return results.
def stop_tracking(self) -> TrackingResult
TrackingResult
object
Object containing memory usage history, timestamps, events, and alerts

get_current_memory

Get current memory usage without starting tracking.
def get_current_memory(self) -> float
float
float
Current memory usage in MB

set_alert_threshold

Update the alert threshold during tracking.
def set_alert_threshold(self, threshold_mb: float) -> None
threshold_mb
float
New threshold in MB

add_alert_callback

Add callback function for memory alerts.
def add_alert_callback(self, callback: Callable[[Dict[str, Any]], None]) -> None
callback
Callable
Function to call when alert is triggered. Receives alert dictionary with timestamp, memory_mb, and threshold_mb
Example:
def alert_handler(alert):
    print(f"Alert: {alert['message']}")
    print(f"Memory: {alert['memory_mb']:.2f} MB")

tracker.add_alert_callback(alert_handler)

check_alerts

Check if any alerts have been triggered recently.
def check_alerts(self) -> bool
bool
bool
True if alerts triggered in last 10 seconds

get_tracking_results

Get current tracking results without stopping.
def get_tracking_results(self) -> TrackingResult

TrackingResult

Results from real-time memory tracking.
start_time
float
Start timestamp
end_time
float
End timestamp
memory_usage
List[float]
List of memory samples in MB
timestamps
List[float]
Corresponding timestamps for each sample
events
List[Dict]
Telemetry events captured during tracking
alerts_triggered
List[Dict]
List of triggered alerts
peak_memory
float
Peak memory usage in MB
average_memory
float
Average memory usage in MB

Properties

duration: Total tracking duration in seconds memory_growth_rate: Memory growth rate in MB/second

MemoryWatchdog

Automatic memory management and cleanup for TensorFlow.

Constructor

MemoryWatchdog(
    max_memory_mb: float = 8000,
    cleanup_threshold_mb: float = 6000,
    check_interval: float = 5.0
)
max_memory_mb
float
default:"8000"
Maximum memory before forced cleanup
cleanup_threshold_mb
float
default:"6000"
Memory threshold to trigger cleanup
check_interval
float
default:"5.0"
Time between memory checks in seconds

Methods

start

Start memory watchdog monitoring.
def start(self) -> None

stop

Stop memory watchdog.
def stop(self) -> None

force_cleanup

Force immediate memory cleanup.
def force_cleanup(self) -> None

add_cleanup_callback

Add custom cleanup callback function.
def add_cleanup_callback(self, callback: Callable[[], None]) -> None
callback
Callable
Function to call during cleanup operations

Example

from tfmemprof.tracker import MemoryTracker, MemoryWatchdog
import tensorflow as tf

# Basic tracking
tracker = MemoryTracker(
    sampling_interval=0.5,
    alert_threshold_mb=4000
)

# Add alert callback
def on_alert(alert):
    print(f"Memory alert: {alert['message']}")
    
tracker.add_alert_callback(on_alert)

# Start tracking
tracker.start_tracking()

# Run your code
model = tf.keras.applications.ResNet50()
model.fit(x_train, y_train, epochs=10)

# Stop and get results
results = tracker.stop_tracking()

print(f"Peak memory: {results.peak_memory:.2f} MB")
print(f"Average memory: {results.average_memory:.2f} MB")
print(f"Alerts triggered: {len(results.alerts_triggered)}")

# Memory watchdog for automatic cleanup
watchdog = MemoryWatchdog(
    max_memory_mb=8000,
    cleanup_threshold_mb=6000
)

# Add custom cleanup
def custom_cleanup():
    print("Running custom cleanup")
    # Clear caches, etc.
    
watchdog.add_cleanup_callback(custom_cleanup)
watchdog.start()

# Your training code
# Watchdog automatically cleans up when thresholds exceeded

watchdog.stop()

Build docs developers (and LLMs) love