CPU memory profiling

The GPU Memory Profiler includes CPU-only profiling for systems without GPU access or when profiling CPU-bound operations.

When to use CPU mode

CPU profiling is useful when:

Developing on machines without GPU
Testing code before GPU deployment
Profiling data preprocessing pipelines
Monitoring CPU memory in hybrid workflows
Running on Apple Silicon without Metal support

CPU profiler

The CPUMemoryProfiler mirrors the GPU profiler API:

Initialize profiler

Create a CPU memory profiler:

from gpumemprof import CPUMemoryProfiler

profiler = CPUMemoryProfiler()

No device configuration needed - automatically uses current process.

Profile functions

Profile function memory usage:

def data_preprocessing(data):
    # CPU-intensive preprocessing
    processed = [transform(item) for item in data]
    return processed

# Profile the function
result = profiler.profile_function(
    data_preprocessing,
    large_dataset
)

print(f"Function: {result.name}")
print(f"Duration: {result.duration:.3f}s")
print(f"Memory diff: {result.memory_diff() / (1024**2):.2f} MB")
print(f"Peak RSS: {result.peak_rss / (1024**2):.2f} MB")

The result includes:

name: Function name
duration: Execution time in seconds
snapshot_before: Memory state before execution
snapshot_after: Memory state after execution
peak_rss: Peak resident set size

Use context manager

Profile code blocks:

with profiler.profile_context("data_loading"):
    # Load large dataset
    data = load_dataset('large_file.csv')
    data = preprocess(data)
    features = extract_features(data)

# Get profiling results
for result in profiler.results:
    print(f"Operation: {result.name}")
    print(f"RSS before: {result.snapshot_before.rss / (1024**2):.2f} MB")
    print(f"RSS after: {result.snapshot_after.rss / (1024**2):.2f} MB")
    print(f"CPU percent: {result.snapshot_after.cpu_percent:.1f}%")

Real-time monitoring

Monitor CPU memory over time:

# Start monitoring
profiler.start_monitoring(interval=0.5)  # 500ms intervals

# Your CPU-intensive code
for epoch in range(10):
    for batch in data_batches:
        process_batch(batch)

# Stop and get summary
profiler.stop_monitoring()
summary = profiler.get_summary()

print(f"Snapshots collected: {summary['snapshots_collected']}")
print(f"Peak memory: {summary['peak_memory_usage'] / (1024**2):.2f} MB")
print(f"Memory change: {summary['memory_change_from_baseline'] / (1024**2):.2f} MB")

# Review snapshots
for snapshot in profiler.snapshots[-10:]:
    print(f"Time: {snapshot.timestamp:.2f}, RSS: {snapshot.rss / (1024**2):.2f} MB")

CPU memory tracker

The CPUMemoryTracker provides real-time tracking with event history:

from gpumemprof import CPUMemoryTracker

tracker = CPUMemoryTracker(
    sampling_interval=0.5,   # Sample every 500ms
    max_events=10000,        # Keep 10k events
    enable_alerts=True       # Enable threshold alerts
)

# Start tracking
tracker.start_tracking()

try:
    # Your code here
    for i in range(100):
        # Allocate memory
        data = [0] * 1000000
        # Process data
        result = process(data)
        # Release
        del data
except Exception as e:
    print(f"Error: {e}")
finally:
    tracker.stop_tracking()

# Get statistics
stats = tracker.get_statistics()
print(f"Mode: {stats['mode']}")
print(f"Total events: {stats['total_events']}")
print(f"Peak memory: {stats['peak_memory'] / (1024**2):.2f} MB")
print(f"Current RSS: {stats['current_memory_allocated'] / (1024**2):.2f} MB")
print(f"Tracking duration: {stats['tracking_duration_seconds']:.1f}s")

Event filtering

Query specific events:

# Get all events
all_events = tracker.get_events()

# Get only allocation events
allocations = tracker.get_events(event_type='allocation')
print(f"Total allocations: {len(allocations)}")

# Get recent events
recent = tracker.get_events(last_n=10)

# Get events since timestamp
threshold_time = time.time() - 60  # Last 60 seconds
recent_time = tracker.get_events(since=threshold_time)

# Analyze events
for event in allocations[:5]:
    print(f"Time: {event.timestamp:.2f}")
    print(f"Type: {event.event_type}")
    print(f"Memory change: {event.memory_change / (1024**2):.2f} MB")
    print(f"Total RSS: {event.memory_allocated / (1024**2):.2f} MB")
    print(f"Context: {event.context}")
    print()

Memory timeline

Get memory usage over time:

# Get timeline data
timeline = tracker.get_memory_timeline(interval=1.0)

# Timeline contains
print(f"Timestamps: {len(timeline['timestamps'])} points")
print(f"Allocated: {len(timeline['allocated'])} values")
print(f"Reserved: {len(timeline['reserved'])} values")

# Plot timeline
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(timeline['timestamps'], 
         [a / (1024**2) for a in timeline['allocated']],
         label='RSS')
plt.xlabel('Time (s)')
plt.ylabel('Memory (MB)')
plt.title('CPU Memory Usage Timeline')
plt.legend()
plt.grid(True)
plt.savefig('cpu_memory_timeline.png')

Export tracking data

Export CPU tracking events:

JSON export
CSV export
Timestamped export

# Export to JSON
tracker.export_events('cpu_tracking.json', format='json')

JSON structure:

[
  {
    "timestamp": 1234567890.123,
    "event_type": "allocation",
    "memory_allocated": 524288000,
    "memory_reserved": 524288000,
    "memory_change": 10485760,
    "device_id": -1,
    "context": "RSS increased by 10.00 MB",
    "collector": "gpumemprof.cpu_tracker",
    "sampling_interval_ms": 500,
    "pid": 12345,
    "host": "machine-name"
  }
]

# Export to CSV
tracker.export_events('cpu_tracking.csv', format='csv')

CSV columns:

timestamp
event_type
memory_allocated
memory_reserved
memory_change
device_id
context
collector
sampling_interval_ms
pid
host

# Export with automatic timestamp
filename = tracker.export_events_with_timestamp(
    directory='./exports',
    format='json'
)
print(f"Exported to: {filename}")
# Output: ./exports/cpu_tracker_20260303_142315.json

CLI usage

The CLI automatically uses CPU mode when GPU is unavailable:

# CPU info
gpumemprof info
# Output:
# CUDA is not available. Falling back to CPU-only profiling.
# Process RSS: 512.00 MB
# Process VMS: 2.50 GB
# CPU Count: 8 physical / 16 logical

# CPU monitoring
gpumemprof monitor --duration 30 --interval 0.5 --output cpu_monitor.csv
# Mode: CPU

# CPU tracking
gpumemprof track --duration 60 --output cpu_track.json --format json
# Running CPU memory tracker (no GPU backend available).

Metrics explained

RSS (Resident Set Size)

The portion of memory occupied by a process that is held in RAM:

snapshot = profiler._take_snapshot()
print(f"RSS: {snapshot.rss / (1024**2):.2f} MB")

Includes all allocated pages in RAM
Excludes swapped pages
Can include shared libraries

VMS (Virtual Memory Size)

Total virtual memory allocated to the process:

snapshot = profiler._take_snapshot()
print(f"VMS: {snapshot.vms / (1024**2):.2f} MB")

Includes memory not yet paged in
Larger than RSS
Represents maximum potential memory usage

CPU percent

CPU utilization of the process:

snapshot = profiler._take_snapshot()
print(f"CPU: {snapshot.cpu_percent:.1f}%")

Percentage of one CPU core
Can exceed 100% on multi-core systems
Averaged over the interval

Integration with PyTorch/TensorFlow

Combine CPU profiling with framework profiling:

PyTorch CPU tensors
TensorFlow CPU ops
Data preprocessing

import torch
from gpumemprof import CPUMemoryProfiler

profiler = CPUMemoryProfiler()

with profiler.profile_context("cpu_tensor_operations"):
    # Large CPU tensor operations
    x = torch.randn(10000, 10000)
    y = torch.randn(10000, 10000)
    result = x @ y
    result = result.sum()

summary = profiler.get_summary()
print(f"Peak memory: {summary['peak_memory_usage'] / (1024**2):.2f} MB")

import tensorflow as tf
from gpumemprof import CPUMemoryTracker

tf.config.set_visible_devices([], 'GPU')  # Force CPU

tracker = CPUMemoryTracker(sampling_interval=0.5)
tracker.start_tracking()

# CPU-only TensorFlow operations
with tf.device('/CPU:0'):
    x = tf.random.normal([5000, 5000])
    y = tf.random.normal([5000, 5000])
    result = tf.matmul(x, y)

tracker.stop_tracking()
stats = tracker.get_statistics()
print(f"Peak memory: {stats['peak_memory'] / (1024**2):.2f} MB")

import numpy as np
import pandas as pd
from gpumemprof import CPUMemoryProfiler

profiler = CPUMemoryProfiler()

# Profile data loading
with profiler.profile_context("data_loading"):
    df = pd.read_csv('large_dataset.csv')

# Profile preprocessing
with profiler.profile_context("preprocessing"):
    df = df.dropna()
    df['normalized'] = (df['value'] - df['value'].mean()) / df['value'].std()
    features = df.values

# Profile feature extraction
with profiler.profile_context("feature_extraction"):
    features = np.concatenate([features, features**2], axis=1)

# Review results
for result in profiler.results:
    print(f"{result.name}: {result.memory_diff() / (1024**2):.2f} MB")

Best practices

Use for development

Profile locally before GPU deployment:

# Development (CPU)
from gpumemprof import CPUMemoryProfiler
profiler = CPUMemoryProfiler()

# Production (GPU)
from gpumemprof import GPUMemoryProfiler
profiler = GPUMemoryProfiler()

Monitor preprocessing

Profile data pipelines:

tracker = CPUMemoryTracker()
tracker.start_tracking()

# Data pipeline
data = load_data()
data = clean_data(data)
features = extract_features(data)

tracker.stop_tracking()

Export for analysis

Save tracking data:

tracker.export_events(
    'cpu_profile.json',
    format='json'
)

Use consistent APIs

Same API as GPU profiling:

# Works for both CPU and GPU
profiler.profile_function(fn)
profiler.start_monitoring()
profiler.get_summary()

Next steps

PyTorch guide

GPU profiling for PyTorch

TensorFlow guide

GPU profiling for TensorFlow

CLI usage

Command-line profiling tools

Visualization

Generate memory usage plots

Get Started

Core Concepts

Guides

Examples

Advanced

When to use CPU mode

CPU profiler

CPU memory tracker

Event filtering

Memory timeline

Export tracking data

CLI usage

Metrics explained

Integration with PyTorch/TensorFlow

Best practices

Use for development

Monitor preprocessing

Export for analysis

Use consistent APIs

Next steps

PyTorch guide

TensorFlow guide

CLI usage

Visualization

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

Advanced

​When to use CPU mode

​CPU profiler

​CPU memory tracker

​Event filtering

​Memory timeline

​Export tracking data

​CLI usage

​Metrics explained

​Integration with PyTorch/TensorFlow

​Best practices

Use for development

Monitor preprocessing

Export for analysis

Use consistent APIs

​Next steps

PyTorch guide

TensorFlow guide

CLI usage

Visualization

Build docs developers (and LLMs) love

When to use CPU mode

CPU profiler

CPU memory tracker

Event filtering

Memory timeline

Export tracking data

CLI usage

Metrics explained

Integration with PyTorch/TensorFlow

Best practices

Next steps