The telemetry module provides a standardized schema for memory profiling events and utilities for converting legacy formats.
Constants
Current telemetry schema version
Sentinel value for unknown process ID
Sentinel value for unknown hostname
Classes
TelemetryEventV2
Canonical telemetry event payload used by tracker exports.
from gpumemprof.telemetry import TelemetryEventV2
event = TelemetryEventV2(
schema_version=2,
timestamp_ns=1709481600000000000,
event_type="allocation",
collector="gpumemprof.cuda_tracker",
sampling_interval_ms=100,
pid=12345,
host="gpu-server-01",
device_id=0,
allocator_allocated_bytes=5368709120,
allocator_reserved_bytes=6442450944,
allocator_active_bytes=5100273664,
allocator_inactive_bytes=268435456,
allocator_change_bytes=134217728,
device_used_bytes=5500000000,
device_free_bytes=11000000000,
device_total_bytes=16500000000,
context="training_step",
metadata={"batch_size": 32, "epoch": 5}
)
Attributes
Schema version (always 2)
Unix timestamp in nanoseconds
Event type (e.g., “allocation”, “deallocation”, “peak”, “warning”)
Collector identifier (e.g., “gpumemprof.cuda_tracker”)
Sampling interval in milliseconds
allocator_allocated_bytes
Bytes allocated by the memory allocator
Bytes reserved by the memory allocator
Active bytes in the allocator
Inactive bytes in the allocator
Memory change in bytes since last event
metadata
Dict[str, Any]
default:"{}"
Additional metadata
Functions
telemetry_event_to_dict()
Serialize a telemetry event to a plain dictionary.
from gpumemprof.telemetry import telemetry_event_to_dict
event_dict = telemetry_event_to_dict(event)
# Returns dict with all event fields
Dictionary representation of the event
validate_telemetry_record()
Validate a v2 telemetry record.
from gpumemprof.telemetry import validate_telemetry_record
record = {
"schema_version": 2,
"timestamp_ns": 1709481600000000000,
"event_type": "allocation",
# ... all required fields
}
try:
validate_telemetry_record(record)
print("Record is valid")
except ValueError as e:
print(f"Validation error: {e}")
Raises ValueError if the record is invalid or has missing fields.
telemetry_event_from_record()
Create a v2 telemetry event from v2 or legacy tracker records.
from gpumemprof.telemetry import telemetry_event_from_record
# From v2 record
v2_record = {
"schema_version": 2,
"timestamp_ns": 1709481600000000000,
"event_type": "allocation",
# ... other fields
}
event = telemetry_event_from_record(v2_record)
# From legacy record (auto-converted)
legacy_record = {
"timestamp": 1709481600.0,
"memory_allocated": 5368709120,
"memory_reserved": 6442450944,
"device": "cuda:0"
}
event = telemetry_event_from_record(
legacy_record,
permissive_legacy=True,
default_collector="gpumemprof.cuda_tracker"
)
Record to convert (v2 or legacy format)
Whether to allow legacy format conversion
default_collector
str
default:"'legacy.unknown'"
Default collector name for legacy records
default_sampling_interval_ms
Default sampling interval for legacy records
Normalized telemetry event
load_telemetry_events()
Load telemetry events from JSON and normalize to v2 payloads.
from gpumemprof.telemetry import load_telemetry_events
# Load from file
events = load_telemetry_events("memory_events.json")
# With custom events key
events = load_telemetry_events(
"export.json",
events_key="profiling_data"
)
# Strict mode (no legacy conversion)
events = load_telemetry_events(
"v2_events.json",
permissive_legacy=False
)
print(f"Loaded {len(events)} events")
for event in events[:5]:
print(f" {event.event_type}: {event.allocator_allocated_bytes} bytes")
Path to JSON file containing telemetry events
Whether to allow legacy format conversion
events_key
Optional[str]
default:"None"
JSON key containing the events array (auto-detects if None)
List of normalized telemetry events
Supported JSON formats:
- Array of events:
[{event1}, {event2}, ...]
- Object with events key:
{"events": [{event1}, {event2}]}
- Single event object:
{event}
- Custom key:
{"custom_key": [{event1}, {event2}]}
Required Fields
All v2 telemetry records must include these fields:
schema_version: Must be 2
timestamp_ns: Unix timestamp in nanoseconds (>= 0)
event_type: Non-empty string
collector: Non-empty string
sampling_interval_ms: Integer >= 0
pid: Process ID (>= -1)
host: Non-empty hostname
device_id: Device identifier
allocator_allocated_bytes: Allocated bytes (>= 0)
allocator_reserved_bytes: Reserved bytes (>= 0)
allocator_active_bytes: Active bytes (>= 0 or null)
allocator_inactive_bytes: Inactive bytes (>= 0 or null)
allocator_change_bytes: Memory change
device_used_bytes: Device memory used (>= 0)
device_free_bytes: Device memory free (>= 0 or null)
device_total_bytes: Total device memory (>= 0 or null)
context: Context string (or null)
metadata: Metadata dictionary
The module automatically converts legacy formats from:
- PyTorch GPU profiler
- TensorFlow memory profiler
- Custom tracking events
Legacy field mappings:
timestamp (seconds) → timestamp_ns (nanoseconds)
memory_allocated → allocator_allocated_bytes
memory_reserved → allocator_reserved_bytes
memory_change → allocator_change_bytes
device string → device_id integer
backend → collector inference
Example Usage
import json
from gpumemprof.telemetry import (
TelemetryEventV2,
telemetry_event_to_dict,
telemetry_event_from_record,
load_telemetry_events,
validate_telemetry_record
)
# Create event from scratch
event = TelemetryEventV2(
schema_version=2,
timestamp_ns=int(time.time() * 1e9),
event_type="allocation",
collector="gpumemprof.cuda_tracker",
sampling_interval_ms=100,
pid=os.getpid(),
host=socket.gethostname(),
device_id=0,
allocator_allocated_bytes=5 * 1024**3,
allocator_reserved_bytes=6 * 1024**3,
allocator_active_bytes=4.8 * 1024**3,
allocator_inactive_bytes=200 * 1024**2,
allocator_change_bytes=512 * 1024**2,
device_used_bytes=5.5 * 1024**3,
device_free_bytes=10.5 * 1024**3,
device_total_bytes=16 * 1024**3,
context="training_batch_47",
metadata={"batch_size": 32, "learning_rate": 0.001}
)
# Convert to dict
event_dict = telemetry_event_to_dict(event)
# Validate
validate_telemetry_record(event_dict)
# Save to file
with open("events.json", "w") as f:
json.dump([event_dict], f, indent=2)
# Load and convert
events = load_telemetry_events("events.json")
for event in events:
gb_allocated = event.allocator_allocated_bytes / 1024**3
print(f"{event.event_type}: {gb_allocated:.2f} GB allocated")
# Convert legacy format
legacy_data = {
"timestamp": 1709481600.0,
"memory_allocated": 5368709120,
"memory_reserved": 6442450944,
"device": "cuda:0",
"backend": "cuda"
}
event = telemetry_event_from_record(
legacy_data,
default_collector="gpumemprof.cuda_tracker",
default_sampling_interval_ms=100
)
print(f"Converted event: {event.timestamp_ns} ns")
print(f"Allocated: {event.allocator_allocated_bytes / 1024**3:.2f} GB")
Integration with Tracker
The telemetry schema is used by MemoryTracker.export_events():
from gpumemprof import MemoryTracker
tracker = MemoryTracker(device="cuda:0")
tracker.start_tracking()
# ... run code ...
tracker.stop_tracking()
# Export uses TelemetryEventV2 format
tracker.export_events("memory_events.json", format="json")
# Load and analyze
from gpumemprof.telemetry import load_telemetry_events
events = load_telemetry_events("memory_events.json")
for event in events:
if event.event_type == "warning":
print(f"Warning at {event.timestamp_ns}: {event.context}")