PyTorch utilities

Type conversion

numpy_to_torch_dtype_dict

Mapping from NumPy dtypes to PyTorch dtypes.

from pufferlib.pytorch import numpy_to_torch_dtype_dict

torch_dtype = numpy_to_torch_dtype_dict[np.dtype('float32')]
# Returns: torch.float32

Supported conversions:

float64 → torch.float64
float32 → torch.float32
float16 → torch.float16
uint64 → torch.uint64
uint32 → torch.uint32
uint16 → torch.uint16
uint8 → torch.uint8
int64 → torch.int64
int32 → torch.int32
int16 → torch.int16
int8 → torch.int8

Layer initialization

layer_init

CleanRL’s default layer initialization with orthogonal weights.

from pufferlib.pytorch import layer_init
import torch.nn as nn

layer = layer_init(nn.Linear(128, 64), std=np.sqrt(2), bias_const=0.0)

layer

nn.Module

required

PyTorch layer to initialize.

std

float

default:"np.sqrt(2)"

Standard deviation for orthogonal initialization.

bias_const

float

default:"0.0"

Constant value for bias initialization.

layer

nn.Module

Initialized layer (same object, modified in-place).

Action sampling

sample_logits

Sample actions from logits and compute log probabilities and entropy.

from pufferlib.pytorch import sample_logits

action, logprob, entropy = sample_logits(logits, action=None)

logits

torch.Tensor | torch.distributions.Normal | tuple

required

Action logits (discrete), Normal distribution (continuous), or tuple of logits (multi-discrete).

action

torch.Tensor

default:"None"

Optional pre-sampled actions. If provided, computes log probability of these actions.

action

torch.Tensor

Sampled actions with shape (batch_size,) for discrete or (batch_size, action_dim) for continuous.

logprob

torch.Tensor

Log probabilities of actions with shape (batch_size,).

entropy

torch.Tensor

Entropy of the action distribution with shape (batch_size,).

log_prob

Compute log probability of discrete actions from logits.

from pufferlib.pytorch import log_prob

log_probs = log_prob(logits, value)

logits

torch.Tensor

required

Action logits with shape (batch_size, num_actions).

value

torch.Tensor

required

Action indices with shape (batch_size,).

log_probs

torch.Tensor

Log probabilities with shape (batch_size,).

entropy

Compute entropy from action logits.

from pufferlib.pytorch import entropy

entropy_values = entropy(logits)

logits

torch.Tensor

required

Action logits with shape (batch_size, num_actions).

entropy_values

torch.Tensor

Entropy values with shape (batch_size,).

entropy_probs

Compute entropy from logits and pre-computed probabilities.

from pufferlib.pytorch import entropy_probs

entropy_values = entropy_probs(logits, probs)

logits

torch.Tensor

required

Action logits with shape (batch_size, num_actions).

probs

torch.Tensor

required

Action probabilities with shape (batch_size, num_actions).

entropy_values

torch.Tensor

Entropy values with shape (batch_size,).

Native dtype utilities

nativize_dtype

Convert emulated observation dtype to native PyTorch dtype information.

from pufferlib.pytorch import nativize_dtype

native_dtype = nativize_dtype(emulated)

emulated

dict

required

Emulated environment dictionary containing:

observation_dtype: Sample dtype from environment
emulated_observation_dtype: Structured numpy dtype

native_dtype

NativeDType

Native dtype specification as tuple (dtype, shape, offset, delta) or nested dict for structured observations.

nativize_tensor

Convert byte observation tensor to native PyTorch tensors using dtype specification.

from pufferlib.pytorch import nativize_tensor

native_obs = nativize_tensor(observation, native_dtype)

observation

torch.Tensor

required

Byte tensor from environment with shape (batch_size, num_bytes).

native_dtype

NativeDType

required

Native dtype specification from nativize_dtype.

native_obs

torch.Tensor | dict

Native tensor or dict of tensors with proper dtypes and shapes.

flattened_tensor_size

Compute total number of elements in a native dtype specification.

from pufferlib.pytorch import flattened_tensor_size

size = flattened_tensor_size(native_dtype)

native_dtype

NativeDType

required

Native dtype specification.

size

int

Total number of elements.

Usage examples

import torch.nn as nn
import numpy as np
from pufferlib.pytorch import layer_init

# Initialize actor head with small std
actor = layer_init(
    nn.Linear(256, num_actions),
    std=0.01
)

# Initialize critic with default std
critic = layer_init(
    nn.Linear(256, 1),
    std=1.0
)

# Initialize hidden layer
hidden = layer_init(
    nn.Linear(128, 256),
    std=np.sqrt(2)
)

Advanced usage

Custom action distributions

The sample_logits function handles multiple action distribution types:

import torch
import torch.nn as nn
from pufferlib.pytorch import sample_logits

class MultiDistributionPolicy(nn.Module):
    def forward(self, obs):
        hidden = self.encoder(obs)
        
        # Discrete actions
        discrete_logits = self.discrete_head(hidden)
        action, logprob, entropy = sample_logits(discrete_logits)
        
        # Multi-discrete actions
        action_heads = [
            self.head1(hidden),
            self.head2(hidden),
            self.head3(hidden)
        ]
        action, logprob, entropy = sample_logits(tuple(action_heads))
        
        # Continuous actions
        mean = self.mean_head(hidden)
        logstd = self.logstd.expand_as(mean)
        dist = torch.distributions.Normal(mean, torch.exp(logstd))
        action, logprob, entropy = sample_logits(dist)
        
        return action, logprob, entropy

Batched dtype conversion

Handle structured observations efficiently:

from pufferlib.pytorch import nativize_dtype, nativize_tensor

# Setup once at initialization
native_dtype = nativize_dtype(env.emulated)

# Convert many observations in training loop
for epoch in range(num_epochs):
    byte_obs = collect_observations()  # (batch_size, num_bytes)
    
    # Fast conversion using pre-computed dtype
    native_obs = nativize_tensor(
        torch.from_numpy(byte_obs),
        native_dtype
    )
    
    # Use in policy
    logits, values = policy(native_obs)

Core API

Training

Emulation

Utilities

Type conversion

numpy_to_torch_dtype_dict

Layer initialization

layer_init

Action sampling

sample_logits

log_prob

entropy

entropy_probs

Native dtype utilities

nativize_dtype

nativize_tensor

flattened_tensor_size

Usage examples

Advanced usage

Custom action distributions

Batched dtype conversion

Build docs developers (and LLMs) love

Core API

Training

Emulation

Utilities

​Type conversion

​numpy_to_torch_dtype_dict

​Layer initialization

​layer_init

​Action sampling

​sample_logits

​log_prob

​entropy

​entropy_probs

​Native dtype utilities

​nativize_dtype

​nativize_tensor

​flattened_tensor_size

​Usage examples

​Advanced usage

​Custom action distributions

​Batched dtype conversion

Build docs developers (and LLMs) love

Type conversion

numpy_to_torch_dtype_dict

Layer initialization

layer_init

Action sampling

sample_logits

log_prob

entropy

entropy_probs

Native dtype utilities

nativize_dtype

nativize_tensor

flattened_tensor_size

Usage examples

Advanced usage

Custom action distributions

Batched dtype conversion