Matrix version: v0.2
Last verified: 2026-02-16
Last verified: 2026-02-16
Source of truth
pyproject.toml(requires-python >=3.10, framework dependency floors)- CLI entry points in
pyproject.toml - Runtime backend detection in
gpumemprof/device_collectors.pyandtfmemprof/utils.py - CI smoke checks in
.github/workflows/ci.yml
Runtime and version support
| Surface | Supported | Notes |
|---|---|---|
| Python | 3.10, 3.11, 3.12 | Python 3.8/3.9 are no longer supported for v0.2 |
| PyTorch package floor | torch>=1.8.0 | Runtime backend can be CUDA, ROCm, MPS, or CPU |
| TensorFlow package floor | tensorflow>=2.4.0 | Runtime backend can be CUDA, ROCm, Metal, or CPU |
| OS | Linux, macOS, Windows | Backend availability depends on installed framework/runtime |
The minimum PyTorch version is 1.8.0, but newer versions are recommended for full feature support.
CLI feature availability by environment
CUDA (NVIDIA)
All CLI commands fully supported:
gpumemprof info/monitor/track/analyze/diagnosetfmemprof info/monitor/track/analyze/diagnose
ROCm (AMD Linux)
All CLI commands fully supported:
gpumemprof info/monitor/track/analyze/diagnosetfmemprof info/monitor/track/analyze/diagnose
Apple Metal / MPS
All CLI commands fully supported:
gpumemprof info/monitor/track/analyze/diagnosetfmemprof info/monitor/track/analyze/diagnose
CPU-only host
All CLI commands with CPU fallback:
gpumemprof info/monitor/track/analyze/diagnosetfmemprof info/monitor/track/analyze/diagnose
Complete CLI availability table
| Environment | gpumemprof info | gpumemprof monitor | gpumemprof track | gpumemprof analyze | gpumemprof diagnose | tfmemprof info | tfmemprof monitor | tfmemprof track | tfmemprof analyze | tfmemprof diagnose |
|---|---|---|---|---|---|---|---|---|---|---|
| CUDA (NVIDIA) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| ROCm (AMD Linux) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Apple Metal / MPS | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| CPU-only host | ✅ | ✅ (CPU fallback) | ✅ (CPU fallback) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Backend capability details
PyTorch (gpumemprof)
| Runtime backend | Typical platform | Telemetry collector | device_total/free support | Notes |
|---|---|---|---|---|
cuda | NVIDIA + CUDA | gpumemprof.cuda_tracker | ✅ | Uses torch.cuda.memory_* |
rocm | AMD + ROCm (Linux) | gpumemprof.rocm_tracker | ✅ | Uses HIP-backed torch.cuda.memory_* |
mps | Apple Silicon (macOS) | gpumemprof.mps_tracker | Partial | Depends on torch.mps.recommended_max_memory() availability |
cpu | Any host | gpumemprof.cpu_tracker | N/A | CPUMemoryProfiler / CPUMemoryTracker fallback |
CUDA backend details
CUDA backend details
Full support for device memory queries including:
- Total device memory
- Free device memory
- Allocated memory
- Reserved memory
- Active/inactive segments
torch.cuda.memory_* APIsROCm backend details
ROCm backend details
Full support through HIP-backed PyTorch CUDA APIs:
- Same API surface as CUDA backend
- Automatic detection via
detect_torch_runtime_backend() - AMD GPU memory management through ROCm
torch.cuda.memory_* APIsMPS backend details
MPS backend details
Partial support for Apple Silicon:
- Memory allocation tracking
- Limited device total/free memory
- Depends on
torch.mps.recommended_max_memory()availability
torch.mps APIs when availableCPU fallback details
CPU fallback details
CPU-only mode when no GPU is available:
- Tracks system memory usage
- No device memory queries
- Full profiling and analysis features still available
psutil for memory monitoringTensorFlow (tfmemprof)
| Runtime backend | Typical platform | Telemetry collector | Notes |
|---|---|---|---|
cuda | NVIDIA + CUDA | tfmemprof.memory_tracker | Build/runtime diagnostics shown in tfmemprof info |
rocm | AMD + ROCm (Linux) | tfmemprof.memory_tracker | Build/runtime diagnostics shown in tfmemprof info |
metal | Apple Silicon | tfmemprof.memory_tracker | Counters can be runtime-dependent on Metal stack |
cpu | Any host | tfmemprof.memory_tracker | Full CLI surface remains available |
CUDA backend details
CUDA backend details
Full support for CUDA devices:
- GPU memory profiling
- Session-based tracking
- Keras model profiling
tfmemprof info to see CUDA build configurationROCm backend details
ROCm backend details
Full support for AMD GPUs:
- ROCm-enabled TensorFlow builds
- Same profiling capabilities as CUDA
tfmemprof info to see ROCm build configurationMetal backend details
Metal backend details
Apple Silicon support:
- Native Metal GPU acceleration
- Memory tracking dependent on Metal stack
- Use
tensorflow-metalfor GPU acceleration
CPU fallback details
CPU fallback details
CPU-only mode:
- Full CLI surface available
- Memory profiling for CPU operations
- Session and graph execution monitoring
Backend capability metadata
Tracker exports may include backend capability hints undermetadata:
Metadata fields
| Field | Type | Description |
|---|---|---|
backend | string | Backend type: "cuda", "rocm", "mps", "cpu", "metal" |
supports_device_total | boolean | Whether total device memory query is supported |
supports_device_free | boolean | Whether free device memory query is supported |
sampling_source | string | Source of memory samples (e.g., torch.cuda.memory_stats) |
Validation notes
CLI examples validation
CLI examples smoke validation is part of CI (
examples.cli.quickstart) in .github/workflows/ci.ymlPlatform-specific considerations
Linux
Best support for CUDA and ROCm backends
Full feature availability
Recommended for production use
macOS
MPS support on Apple Silicon
Metal support for TensorFlow
Windows
CUDA support available
Full PyTorch and TensorFlow support
Installation examples
See also
- Architecture - System architecture and design
- Telemetry Schema - Event format specification
- Troubleshooting - Common issues and solutions