Skip to main content
CUTLASS Python provides various utility functions for device management, logging, memory management, and type conversion.

Device Management

device_cc()

import cutlass
from cutlass.backend.utils.device import device_cc

cc = device_cc()
print(f"Compute capability: {cc}")  # e.g., 90 for H100
Returns the compute capability of the current CUDA device. Returns: int - Compute capability (e.g., 70, 75, 80, 86, 89, 90, 100) Use cases:
  • Auto-detecting device capabilities
  • Selecting appropriate kernel implementations
  • Validating feature support

device_id()

import cutlass

device = cutlass.device_id()
print(f"Using CUDA device: {device}")
Returns the CUDA device ID being used by CUTLASS. Returns: int - CUDA device ID (default: 0) Environment variable: Set CUTLASS_CUDA_DEVICE_ID to override default device.
export CUTLASS_CUDA_DEVICE_ID=1
python my_script.py

initialize_cuda_context()

import cutlass

cutlass.initialize_cuda_context()
Explicitly initializes the CUDA context. This is called automatically by most CUTLASS operations, but can be called manually for early initialization. Side effects:
  • Creates CUDA context if not already created
  • Initializes RMM memory pool if enabled
  • Validates CUDA and NVCC versions match

Logging

set_log_level()

import logging
import cutlass

# Set to DEBUG for verbose output
cutlass.set_log_level(logging.DEBUG)

# Standard Python logging levels
cutlass.set_log_level(logging.INFO)
cutlass.set_log_level(logging.WARNING)
cutlass.set_log_level(logging.ERROR)
Sets the logging level for CUTLASS operations.
level
int
required
Python logging level constant from the logging module.
Available levels:
  • logging.DEBUG (10) - Detailed diagnostic information
  • logging.INFO (20) - General informational messages
  • logging.WARNING (30) - Warning messages (default)
  • logging.ERROR (40) - Error messages only
  • logging.CRITICAL (50) - Critical errors only
Example output:
import logging
import cutlass
from cutlass.op import Gemm

cutlass.set_log_level(logging.DEBUG)

plan = Gemm(element=torch.float32)
plan.compile()
# Output:
# DEBUG:cutlass:Initializing option registry
# DEBUG:cutlass:Compiling kernel for sm_90
# DEBUG:cutlass:Selected tile shape: 128x128x32

Memory Management

get_memory_pool()

import cutlass

pool = cutlass.get_memory_pool()
Returns the RMM (RAPIDS Memory Manager) memory pool if enabled, otherwise returns None. Requirements:
  • Python 3.9+
  • RMM package installed
Returns: RMM memory pool or None Note: The memory pool is created on first access with default settings:
  • Initial pool size: 1 GB (2^30 bytes)
  • Maximum pool size: 4 GB (2^32 bytes)

create_memory_pool()

from cutlass.backend import create_memory_pool

pool = create_memory_pool(
    init_pool_size=2**30,   # 1 GB initial
    max_pool_size=2**32     # 4 GB maximum
)
Creates a custom RMM memory pool with specified sizes.
init_pool_size
int
default:"2**30"
Initial memory pool size in bytes.
max_pool_size
int
default:"2**32"
Maximum memory pool size in bytes.
Returns: RMM memory pool object

Type Utilities

library_type()

from cutlass.utils import datatypes
import torch
import numpy as np

# Convert torch dtype to CUTLASS DataType
cutlass_type = datatypes.library_type(torch.float16)
print(cutlass_type)  # DataType.f16

# Convert numpy dtype to CUTLASS DataType  
cutlass_type = datatypes.library_type(np.float32)
print(cutlass_type)  # DataType.f32
Converts native Python/framework data types to CUTLASS DataType enum values. Supported input types:
  • PyTorch dtypes: torch.float16, torch.float32, torch.float64, torch.bfloat16, torch.int8, torch.int32
  • NumPy dtypes: numpy.float16, numpy.float32, numpy.float64, numpy.int8, numpy.int32
  • CUTLASS DataType: passed through unchanged
Returns: cutlass.DataType enum value

Compiler Configuration

nvcc_version()

import cutlass

version = cutlass.nvcc_version()
print(f"NVCC version: {version}")  # e.g., "12.0"
Returns the version of NVCC (NVIDIA CUDA Compiler) being used. Returns: str - NVCC version string

cuda_install_path()

import cutlass

path = cutlass.cuda_install_path()
print(f"CUDA installation: {path}")  # e.g., "/usr/local/cuda"
Returns the path to the CUDA installation. Returns: str - Path to CUDA installation directory Detection order:
  1. CUDA_INSTALL_PATH environment variable
  2. Path derived from nvcc location

Environment Variables

CUTLASS_PATH

export CUTLASS_PATH=/path/to/cutlass
python my_script.py
Overrides the CUTLASS source code location. Default: Auto-detected based on package installation

CUDA_INSTALL_PATH

export CUDA_INSTALL_PATH=/usr/local/cuda-12.0
python my_script.py
Overrides the CUDA installation path. Default: Derived from nvcc location

CUTLASS_CUDA_DEVICE_ID

export CUTLASS_CUDA_DEVICE_ID=1
python my_script.py
Sets the CUDA device to use. Default: 0 (first GPU)

Version Information

version

import cutlass

print(f"CUTLASS Python version: {cutlass.__version__}")
# Output: 4.4.1
Returns the CUTLASS Python package version string.

Cache Management

CUTLASS caches compiled kernels to disk to avoid recompilation.

Cache Location

Compiled kernels are cached in:
~/.cutlass/compiled_cache.db

Clearing Cache

To force recompilation, delete the cache file:
rm ~/.cutlass/compiled_cache.db
Or programmatically:
import os
from pathlib import Path

cache_file = Path.home() / ".cutlass" / "compiled_cache.db"
if cache_file.exists():
    os.remove(cache_file)

Error Checking

check()

from cutlass.utils import check
import cuda

# Check CUDA error codes
err, result = cuda.cuDeviceGetCount()
check.check_cuda_errors(err)

# Check for None/invalid values
check.check_value(tensor, "tensor A")
Utility functions for error checking and validation.

Constants

SharedMemPerCC

from cutlass_library import SharedMemPerCC

# Maximum shared memory per compute capability
max_smem_sm70 = SharedMemPerCC[70]  # Volta
max_smem_sm80 = SharedMemPerCC[80]  # Ampere
max_smem_sm90 = SharedMemPerCC[90]  # Hopper
Dictionary mapping compute capability to maximum shared memory in bytes.

Complete Example

import logging
import torch
import cutlass
from cutlass.op import Gemm
from cutlass.backend.utils.device import device_cc
from cutlass.utils import datatypes

# Configure logging
cutlass.set_log_level(logging.INFO)

# Get device info
cc = device_cc()
device = cutlass.device_id()
print(f"Using GPU {device} with compute capability {cc}")

# Check CUTLASS version
print(f"CUTLASS Python version: {cutlass.__version__}")

# Get CUDA info
print(f"NVCC version: {cutlass.nvcc_version()}")
print(f"CUDA path: {cutlass.cuda_install_path()}")

# Convert types
torch_dtype = torch.float16
cutlass_dtype = datatypes.library_type(torch_dtype)
print(f"Torch {torch_dtype} -> CUTLASS {cutlass_dtype}")

# Create and run GEMM
M, N, K = 1024, 1024, 1024
A = torch.randn((M, K), device='cuda', dtype=torch.float16)
B = torch.randn((K, N), device='cuda', dtype=torch.float16)
C = torch.zeros((M, N), device='cuda', dtype=torch.float16)
D = torch.zeros((M, N), device='cuda', dtype=torch.float16)

plan = Gemm(A=A, B=B, C=C, D=D)
plan.run()

print("GEMM completed successfully")

Source Code

  • Main module: cutlass/python/cutlass_cppgen/init.py
  • Device utilities: cutlass/python/cutlass_cppgen/backend/utils/device.py
  • Type utilities: cutlass/python/cutlass_cppgen/utils/datatypes.py

See Also

Build docs developers (and LLMs) love