Skip to main content

Type conversion

numpy_to_torch_dtype_dict

Mapping from NumPy dtypes to PyTorch dtypes.
from pufferlib.pytorch import numpy_to_torch_dtype_dict

torch_dtype = numpy_to_torch_dtype_dict[np.dtype('float32')]
# Returns: torch.float32
Supported conversions:
  • float64torch.float64
  • float32torch.float32
  • float16torch.float16
  • uint64torch.uint64
  • uint32torch.uint32
  • uint16torch.uint16
  • uint8torch.uint8
  • int64torch.int64
  • int32torch.int32
  • int16torch.int16
  • int8torch.int8

Layer initialization

layer_init

CleanRL’s default layer initialization with orthogonal weights.
from pufferlib.pytorch import layer_init
import torch.nn as nn

layer = layer_init(nn.Linear(128, 64), std=np.sqrt(2), bias_const=0.0)
layer
nn.Module
required
PyTorch layer to initialize.
std
float
default:"np.sqrt(2)"
Standard deviation for orthogonal initialization.
bias_const
float
default:"0.0"
Constant value for bias initialization.
layer
nn.Module
Initialized layer (same object, modified in-place).

Action sampling

sample_logits

Sample actions from logits and compute log probabilities and entropy.
from pufferlib.pytorch import sample_logits

action, logprob, entropy = sample_logits(logits, action=None)
logits
torch.Tensor | torch.distributions.Normal | tuple
required
Action logits (discrete), Normal distribution (continuous), or tuple of logits (multi-discrete).
action
torch.Tensor
default:"None"
Optional pre-sampled actions. If provided, computes log probability of these actions.
action
torch.Tensor
Sampled actions with shape (batch_size,) for discrete or (batch_size, action_dim) for continuous.
logprob
torch.Tensor
Log probabilities of actions with shape (batch_size,).
entropy
torch.Tensor
Entropy of the action distribution with shape (batch_size,).

log_prob

Compute log probability of discrete actions from logits.
from pufferlib.pytorch import log_prob

log_probs = log_prob(logits, value)
logits
torch.Tensor
required
Action logits with shape (batch_size, num_actions).
value
torch.Tensor
required
Action indices with shape (batch_size,).
log_probs
torch.Tensor
Log probabilities with shape (batch_size,).

entropy

Compute entropy from action logits.
from pufferlib.pytorch import entropy

entropy_values = entropy(logits)
logits
torch.Tensor
required
Action logits with shape (batch_size, num_actions).
entropy_values
torch.Tensor
Entropy values with shape (batch_size,).

entropy_probs

Compute entropy from logits and pre-computed probabilities.
from pufferlib.pytorch import entropy_probs

entropy_values = entropy_probs(logits, probs)
logits
torch.Tensor
required
Action logits with shape (batch_size, num_actions).
probs
torch.Tensor
required
Action probabilities with shape (batch_size, num_actions).
entropy_values
torch.Tensor
Entropy values with shape (batch_size,).

Native dtype utilities

nativize_dtype

Convert emulated observation dtype to native PyTorch dtype information.
from pufferlib.pytorch import nativize_dtype

native_dtype = nativize_dtype(emulated)
emulated
dict
required
Emulated environment dictionary containing:
  • observation_dtype: Sample dtype from environment
  • emulated_observation_dtype: Structured numpy dtype
native_dtype
NativeDType
Native dtype specification as tuple (dtype, shape, offset, delta) or nested dict for structured observations.

nativize_tensor

Convert byte observation tensor to native PyTorch tensors using dtype specification.
from pufferlib.pytorch import nativize_tensor

native_obs = nativize_tensor(observation, native_dtype)
observation
torch.Tensor
required
Byte tensor from environment with shape (batch_size, num_bytes).
native_dtype
NativeDType
required
Native dtype specification from nativize_dtype.
native_obs
torch.Tensor | dict
Native tensor or dict of tensors with proper dtypes and shapes.

flattened_tensor_size

Compute total number of elements in a native dtype specification.
from pufferlib.pytorch import flattened_tensor_size

size = flattened_tensor_size(native_dtype)
native_dtype
NativeDType
required
Native dtype specification.
size
int
Total number of elements.

Usage examples

import torch.nn as nn
import numpy as np
from pufferlib.pytorch import layer_init

# Initialize actor head with small std
actor = layer_init(
    nn.Linear(256, num_actions),
    std=0.01
)

# Initialize critic with default std
critic = layer_init(
    nn.Linear(256, 1),
    std=1.0
)

# Initialize hidden layer
hidden = layer_init(
    nn.Linear(128, 256),
    std=np.sqrt(2)
)

Advanced usage

Custom action distributions

The sample_logits function handles multiple action distribution types:
import torch
import torch.nn as nn
from pufferlib.pytorch import sample_logits

class MultiDistributionPolicy(nn.Module):
    def forward(self, obs):
        hidden = self.encoder(obs)
        
        # Discrete actions
        discrete_logits = self.discrete_head(hidden)
        action, logprob, entropy = sample_logits(discrete_logits)
        
        # Multi-discrete actions
        action_heads = [
            self.head1(hidden),
            self.head2(hidden),
            self.head3(hidden)
        ]
        action, logprob, entropy = sample_logits(tuple(action_heads))
        
        # Continuous actions
        mean = self.mean_head(hidden)
        logstd = self.logstd.expand_as(mean)
        dist = torch.distributions.Normal(mean, torch.exp(logstd))
        action, logprob, entropy = sample_logits(dist)
        
        return action, logprob, entropy

Batched dtype conversion

Handle structured observations efficiently:
from pufferlib.pytorch import nativize_dtype, nativize_tensor

# Setup once at initialization
native_dtype = nativize_dtype(env.emulated)

# Convert many observations in training loop
for epoch in range(num_epochs):
    byte_obs = collect_observations()  # (batch_size, num_bytes)
    
    # Fast conversion using pre-computed dtype
    native_obs = nativize_tensor(
        torch.from_numpy(byte_obs),
        native_dtype
    )
    
    # Use in policy
    logits, values = policy(native_obs)

Build docs developers (and LLMs) love