Utility Functions

CUTLASS Python provides various utility functions for device management, logging, memory management, and type conversion.

Device Management

device_cc()

import cutlass
from cutlass.backend.utils.device import device_cc

cc = device_cc()
print(f"Compute capability: {cc}")  # e.g., 90 for H100

Returns the compute capability of the current CUDA device. Returns: int - Compute capability (e.g., 70, 75, 80, 86, 89, 90, 100) Use cases:

Auto-detecting device capabilities
Selecting appropriate kernel implementations
Validating feature support

device_id()

import cutlass

device = cutlass.device_id()
print(f"Using CUDA device: {device}")

Returns the CUDA device ID being used by CUTLASS. Returns: int - CUDA device ID (default: 0) Environment variable: Set CUTLASS_CUDA_DEVICE_ID to override default device.

export CUTLASS_CUDA_DEVICE_ID=1
python my_script.py

initialize_cuda_context()

import cutlass

cutlass.initialize_cuda_context()

Explicitly initializes the CUDA context. This is called automatically by most CUTLASS operations, but can be called manually for early initialization. Side effects:

Creates CUDA context if not already created
Initializes RMM memory pool if enabled
Validates CUDA and NVCC versions match

Logging

set_log_level()

import logging
import cutlass

# Set to DEBUG for verbose output
cutlass.set_log_level(logging.DEBUG)

# Standard Python logging levels
cutlass.set_log_level(logging.INFO)
cutlass.set_log_level(logging.WARNING)
cutlass.set_log_level(logging.ERROR)

Sets the logging level for CUTLASS operations.

level

int

required

Python logging level constant from the logging module.

Available levels:

logging.DEBUG (10) - Detailed diagnostic information
logging.INFO (20) - General informational messages
logging.WARNING (30) - Warning messages (default)
logging.ERROR (40) - Error messages only
logging.CRITICAL (50) - Critical errors only

Example output:

import logging
import cutlass
from cutlass.op import Gemm

cutlass.set_log_level(logging.DEBUG)

plan = Gemm(element=torch.float32)
plan.compile()
# Output:
# DEBUG:cutlass:Initializing option registry
# DEBUG:cutlass:Compiling kernel for sm_90
# DEBUG:cutlass:Selected tile shape: 128x128x32

Memory Management

get_memory_pool()

import cutlass

pool = cutlass.get_memory_pool()

Returns the RMM (RAPIDS Memory Manager) memory pool if enabled, otherwise returns None. Requirements:

Python 3.9+
RMM package installed

Returns: RMM memory pool or None Note: The memory pool is created on first access with default settings:

Initial pool size: 1 GB (2^30 bytes)
Maximum pool size: 4 GB (2^32 bytes)

create_memory_pool()

from cutlass.backend import create_memory_pool

pool = create_memory_pool(
    init_pool_size=2**30,   # 1 GB initial
    max_pool_size=2**32     # 4 GB maximum
)

Creates a custom RMM memory pool with specified sizes.

init_pool_size

int

default:"2**30"

Initial memory pool size in bytes.

max_pool_size

int

default:"2**32"

Maximum memory pool size in bytes.

Returns: RMM memory pool object

Type Utilities

library_type()

from cutlass.utils import datatypes
import torch
import numpy as np

# Convert torch dtype to CUTLASS DataType
cutlass_type = datatypes.library_type(torch.float16)
print(cutlass_type)  # DataType.f16

# Convert numpy dtype to CUTLASS DataType  
cutlass_type = datatypes.library_type(np.float32)
print(cutlass_type)  # DataType.f32

Converts native Python/framework data types to CUTLASS DataType enum values. Supported input types:

PyTorch dtypes: torch.float16, torch.float32, torch.float64, torch.bfloat16, torch.int8, torch.int32
NumPy dtypes: numpy.float16, numpy.float32, numpy.float64, numpy.int8, numpy.int32
CUTLASS DataType: passed through unchanged

Returns: cutlass.DataType enum value

Compiler Configuration

nvcc_version()

import cutlass

version = cutlass.nvcc_version()
print(f"NVCC version: {version}")  # e.g., "12.0"

Returns the version of NVCC (NVIDIA CUDA Compiler) being used. Returns: str - NVCC version string

cuda_install_path()

import cutlass

path = cutlass.cuda_install_path()
print(f"CUDA installation: {path}")  # e.g., "/usr/local/cuda"

Returns the path to the CUDA installation. Returns: str - Path to CUDA installation directory Detection order:

CUDA_INSTALL_PATH environment variable
Path derived from nvcc location

Environment Variables

CUTLASS_PATH

export CUTLASS_PATH=/path/to/cutlass
python my_script.py

Overrides the CUTLASS source code location. Default: Auto-detected based on package installation

CUDA_INSTALL_PATH

export CUDA_INSTALL_PATH=/usr/local/cuda-12.0
python my_script.py

Overrides the CUDA installation path. Default: Derived from nvcc location

CUTLASS_CUDA_DEVICE_ID

export CUTLASS_CUDA_DEVICE_ID=1
python my_script.py

Sets the CUDA device to use. Default: 0 (first GPU)

Version Information

version

import cutlass

print(f"CUTLASS Python version: {cutlass.__version__}")
# Output: 4.4.1

Returns the CUTLASS Python package version string.

Cache Management

CUTLASS caches compiled kernels to disk to avoid recompilation.

Cache Location

Compiled kernels are cached in:

~/.cutlass/compiled_cache.db

Clearing Cache

To force recompilation, delete the cache file:

rm ~/.cutlass/compiled_cache.db

Or programmatically:

import os
from pathlib import Path

cache_file = Path.home() / ".cutlass" / "compiled_cache.db"
if cache_file.exists():
    os.remove(cache_file)

Error Checking

check()

from cutlass.utils import check
import cuda

# Check CUDA error codes
err, result = cuda.cuDeviceGetCount()
check.check_cuda_errors(err)

# Check for None/invalid values
check.check_value(tensor, "tensor A")

Utility functions for error checking and validation.

Constants

SharedMemPerCC

from cutlass_library import SharedMemPerCC

# Maximum shared memory per compute capability
max_smem_sm70 = SharedMemPerCC[70]  # Volta
max_smem_sm80 = SharedMemPerCC[80]  # Ampere
max_smem_sm90 = SharedMemPerCC[90]  # Hopper

Dictionary mapping compute capability to maximum shared memory in bytes.

Complete Example

import logging
import torch
import cutlass
from cutlass.op import Gemm
from cutlass.backend.utils.device import device_cc
from cutlass.utils import datatypes

# Configure logging
cutlass.set_log_level(logging.INFO)

# Get device info
cc = device_cc()
device = cutlass.device_id()
print(f"Using GPU {device} with compute capability {cc}")

# Check CUTLASS version
print(f"CUTLASS Python version: {cutlass.__version__}")

# Get CUDA info
print(f"NVCC version: {cutlass.nvcc_version()}")
print(f"CUDA path: {cutlass.cuda_install_path()}")

# Convert types
torch_dtype = torch.float16
cutlass_dtype = datatypes.library_type(torch_dtype)
print(f"Torch {torch_dtype} -> CUTLASS {cutlass_dtype}")

# Create and run GEMM
M, N, K = 1024, 1024, 1024
A = torch.randn((M, K), device='cuda', dtype=torch.float16)
B = torch.randn((K, N), device='cuda', dtype=torch.float16)
C = torch.zeros((M, N), device='cuda', dtype=torch.float16)
D = torch.zeros((M, N), device='cuda', dtype=torch.float16)

plan = Gemm(A=A, B=B, C=C, D=D)
plan.run()

print("GEMM completed successfully")

Source Code

Main module: cutlass/python/cutlass_cppgen/init.py
Device utilities: cutlass/python/cutlass_cppgen/backend/utils/device.py
Type utilities: cutlass/python/cutlass_cppgen/utils/datatypes.py

C++ Templates

CuTe Library

Python API

CuTe DSL

Utility Functions

Device Management

device_cc()

device_id()

initialize_cuda_context()

Logging

set_log_level()

Memory Management

get_memory_pool()

create_memory_pool()

Type Utilities

library_type()

Compiler Configuration

nvcc_version()

cuda_install_path()

Environment Variables

CUTLASS_PATH

CUDA_INSTALL_PATH

CUTLASS_CUDA_DEVICE_ID

Version Information

version

Cache Management

Cache Location

Clearing Cache

Error Checking

check()

Constants

SharedMemPerCC

Complete Example

Source Code

See Also

Build docs developers (and LLMs) love

C++ Templates

CuTe Library

Python API

CuTe DSL

​Device Management

​device_cc()

​device_id()

​initialize_cuda_context()

​Logging

​set_log_level()

​Memory Management

​get_memory_pool()

​create_memory_pool()

​Type Utilities

​library_type()

​Compiler Configuration

​nvcc_version()

​cuda_install_path()

​Environment Variables

​CUTLASS_PATH

​CUDA_INSTALL_PATH

​CUTLASS_CUDA_DEVICE_ID

​Version Information

​version

​Cache Management

​Cache Location

​Clearing Cache

​Error Checking

​check()

​Constants

​SharedMemPerCC

​Complete Example

​Source Code

​See Also

Build docs developers (and LLMs) love

Device Management

device_cc()

device_id()

initialize_cuda_context()

Logging

set_log_level()

Memory Management

get_memory_pool()

create_memory_pool()

Type Utilities

library_type()

Compiler Configuration

nvcc_version()

cuda_install_path()

Environment Variables

CUTLASS_PATH

CUDA_INSTALL_PATH

CUTLASS_CUDA_DEVICE_ID

Version Information

version

Cache Management

Cache Location

Clearing Cache

Error Checking

check()

Constants

SharedMemPerCC

Complete Example

Source Code

See Also