Type conversion
numpy_to_torch_dtype_dict
Mapping from NumPy dtypes to PyTorch dtypes.float64→torch.float64float32→torch.float32float16→torch.float16uint64→torch.uint64uint32→torch.uint32uint16→torch.uint16uint8→torch.uint8int64→torch.int64int32→torch.int32int16→torch.int16int8→torch.int8
Layer initialization
layer_init
CleanRL’s default layer initialization with orthogonal weights.PyTorch layer to initialize.
Standard deviation for orthogonal initialization.
Constant value for bias initialization.
Initialized layer (same object, modified in-place).
Action sampling
sample_logits
Sample actions from logits and compute log probabilities and entropy.Action logits (discrete), Normal distribution (continuous), or tuple of logits (multi-discrete).
Optional pre-sampled actions. If provided, computes log probability of these actions.
Sampled actions with shape
(batch_size,) for discrete or (batch_size, action_dim) for continuous.Log probabilities of actions with shape
(batch_size,).Entropy of the action distribution with shape
(batch_size,).log_prob
Compute log probability of discrete actions from logits.Action logits with shape
(batch_size, num_actions).Action indices with shape
(batch_size,).Log probabilities with shape
(batch_size,).entropy
Compute entropy from action logits.Action logits with shape
(batch_size, num_actions).Entropy values with shape
(batch_size,).entropy_probs
Compute entropy from logits and pre-computed probabilities.Action logits with shape
(batch_size, num_actions).Action probabilities with shape
(batch_size, num_actions).Entropy values with shape
(batch_size,).Native dtype utilities
nativize_dtype
Convert emulated observation dtype to native PyTorch dtype information.Emulated environment dictionary containing:
observation_dtype: Sample dtype from environmentemulated_observation_dtype: Structured numpy dtype
Native dtype specification as tuple
(dtype, shape, offset, delta) or nested dict for structured observations.nativize_tensor
Convert byte observation tensor to native PyTorch tensors using dtype specification.Byte tensor from environment with shape
(batch_size, num_bytes).Native dtype specification from
nativize_dtype.Native tensor or dict of tensors with proper dtypes and shapes.
flattened_tensor_size
Compute total number of elements in a native dtype specification.Native dtype specification.
Total number of elements.
Usage examples
Advanced usage
Custom action distributions
Thesample_logits function handles multiple action distribution types: