CUTLASS Python provides various utility functions for device management, logging, memory management, and type conversion.
Device Management
device_cc()
import cutlass
from cutlass.backend.utils.device import device_cc
cc = device_cc()
print(f"Compute capability: {cc}") # e.g., 90 for H100
Returns the compute capability of the current CUDA device.
Returns: int - Compute capability (e.g., 70, 75, 80, 86, 89, 90, 100)
Use cases:
- Auto-detecting device capabilities
- Selecting appropriate kernel implementations
- Validating feature support
device_id()
import cutlass
device = cutlass.device_id()
print(f"Using CUDA device: {device}")
Returns the CUDA device ID being used by CUTLASS.
Returns: int - CUDA device ID (default: 0)
Environment variable: Set CUTLASS_CUDA_DEVICE_ID to override default device.
export CUTLASS_CUDA_DEVICE_ID=1
python my_script.py
initialize_cuda_context()
import cutlass
cutlass.initialize_cuda_context()
Explicitly initializes the CUDA context. This is called automatically by most CUTLASS operations, but can be called manually for early initialization.
Side effects:
- Creates CUDA context if not already created
- Initializes RMM memory pool if enabled
- Validates CUDA and NVCC versions match
Logging
set_log_level()
import logging
import cutlass
# Set to DEBUG for verbose output
cutlass.set_log_level(logging.DEBUG)
# Standard Python logging levels
cutlass.set_log_level(logging.INFO)
cutlass.set_log_level(logging.WARNING)
cutlass.set_log_level(logging.ERROR)
Sets the logging level for CUTLASS operations.
Python logging level constant from the logging module.
Available levels:
logging.DEBUG (10) - Detailed diagnostic information
logging.INFO (20) - General informational messages
logging.WARNING (30) - Warning messages (default)
logging.ERROR (40) - Error messages only
logging.CRITICAL (50) - Critical errors only
Example output:
import logging
import cutlass
from cutlass.op import Gemm
cutlass.set_log_level(logging.DEBUG)
plan = Gemm(element=torch.float32)
plan.compile()
# Output:
# DEBUG:cutlass:Initializing option registry
# DEBUG:cutlass:Compiling kernel for sm_90
# DEBUG:cutlass:Selected tile shape: 128x128x32
Memory Management
get_memory_pool()
import cutlass
pool = cutlass.get_memory_pool()
Returns the RMM (RAPIDS Memory Manager) memory pool if enabled, otherwise returns None.
Requirements:
- Python 3.9+
- RMM package installed
Returns: RMM memory pool or None
Note: The memory pool is created on first access with default settings:
- Initial pool size: 1 GB (2^30 bytes)
- Maximum pool size: 4 GB (2^32 bytes)
create_memory_pool()
from cutlass.backend import create_memory_pool
pool = create_memory_pool(
init_pool_size=2**30, # 1 GB initial
max_pool_size=2**32 # 4 GB maximum
)
Creates a custom RMM memory pool with specified sizes.
Initial memory pool size in bytes.
Maximum memory pool size in bytes.
Returns: RMM memory pool object
Type Utilities
library_type()
from cutlass.utils import datatypes
import torch
import numpy as np
# Convert torch dtype to CUTLASS DataType
cutlass_type = datatypes.library_type(torch.float16)
print(cutlass_type) # DataType.f16
# Convert numpy dtype to CUTLASS DataType
cutlass_type = datatypes.library_type(np.float32)
print(cutlass_type) # DataType.f32
Converts native Python/framework data types to CUTLASS DataType enum values.
Supported input types:
- PyTorch dtypes:
torch.float16, torch.float32, torch.float64, torch.bfloat16, torch.int8, torch.int32
- NumPy dtypes:
numpy.float16, numpy.float32, numpy.float64, numpy.int8, numpy.int32
- CUTLASS DataType: passed through unchanged
Returns: cutlass.DataType enum value
Compiler Configuration
nvcc_version()
import cutlass
version = cutlass.nvcc_version()
print(f"NVCC version: {version}") # e.g., "12.0"
Returns the version of NVCC (NVIDIA CUDA Compiler) being used.
Returns: str - NVCC version string
cuda_install_path()
import cutlass
path = cutlass.cuda_install_path()
print(f"CUDA installation: {path}") # e.g., "/usr/local/cuda"
Returns the path to the CUDA installation.
Returns: str - Path to CUDA installation directory
Detection order:
CUDA_INSTALL_PATH environment variable
- Path derived from
nvcc location
Environment Variables
CUTLASS_PATH
export CUTLASS_PATH=/path/to/cutlass
python my_script.py
Overrides the CUTLASS source code location.
Default: Auto-detected based on package installation
CUDA_INSTALL_PATH
export CUDA_INSTALL_PATH=/usr/local/cuda-12.0
python my_script.py
Overrides the CUDA installation path.
Default: Derived from nvcc location
CUTLASS_CUDA_DEVICE_ID
export CUTLASS_CUDA_DEVICE_ID=1
python my_script.py
Sets the CUDA device to use.
Default: 0 (first GPU)
version
import cutlass
print(f"CUTLASS Python version: {cutlass.__version__}")
# Output: 4.4.1
Returns the CUTLASS Python package version string.
Cache Management
CUTLASS caches compiled kernels to disk to avoid recompilation.
Cache Location
Compiled kernels are cached in:
~/.cutlass/compiled_cache.db
Clearing Cache
To force recompilation, delete the cache file:
rm ~/.cutlass/compiled_cache.db
Or programmatically:
import os
from pathlib import Path
cache_file = Path.home() / ".cutlass" / "compiled_cache.db"
if cache_file.exists():
os.remove(cache_file)
Error Checking
check()
from cutlass.utils import check
import cuda
# Check CUDA error codes
err, result = cuda.cuDeviceGetCount()
check.check_cuda_errors(err)
# Check for None/invalid values
check.check_value(tensor, "tensor A")
Utility functions for error checking and validation.
Constants
SharedMemPerCC
from cutlass_library import SharedMemPerCC
# Maximum shared memory per compute capability
max_smem_sm70 = SharedMemPerCC[70] # Volta
max_smem_sm80 = SharedMemPerCC[80] # Ampere
max_smem_sm90 = SharedMemPerCC[90] # Hopper
Dictionary mapping compute capability to maximum shared memory in bytes.
Complete Example
import logging
import torch
import cutlass
from cutlass.op import Gemm
from cutlass.backend.utils.device import device_cc
from cutlass.utils import datatypes
# Configure logging
cutlass.set_log_level(logging.INFO)
# Get device info
cc = device_cc()
device = cutlass.device_id()
print(f"Using GPU {device} with compute capability {cc}")
# Check CUTLASS version
print(f"CUTLASS Python version: {cutlass.__version__}")
# Get CUDA info
print(f"NVCC version: {cutlass.nvcc_version()}")
print(f"CUDA path: {cutlass.cuda_install_path()}")
# Convert types
torch_dtype = torch.float16
cutlass_dtype = datatypes.library_type(torch_dtype)
print(f"Torch {torch_dtype} -> CUTLASS {cutlass_dtype}")
# Create and run GEMM
M, N, K = 1024, 1024, 1024
A = torch.randn((M, K), device='cuda', dtype=torch.float16)
B = torch.randn((K, N), device='cuda', dtype=torch.float16)
C = torch.zeros((M, N), device='cuda', dtype=torch.float16)
D = torch.zeros((M, N), device='cuda', dtype=torch.float16)
plan = Gemm(A=A, B=B, C=C, D=D)
plan.run()
print("GEMM completed successfully")
Source Code
- Main module: cutlass/python/cutlass_cppgen/init.py
- Device utilities:
cutlass/python/cutlass_cppgen/backend/utils/device.py
- Type utilities:
cutlass/python/cutlass_cppgen/utils/datatypes.py
See Also