Generation parameters

The sampling script provides several parameters to control how text is generated. These parameters affect the randomness, diversity, and length of the generated outputs.

Core parameters

num_samples

int

default:"10"

Number of independent samples to generate. Each sample starts from the same prompt but produces different output due to randomness.

max_new_tokens

int

default:"500"

Maximum number of tokens to generate in each sample. This controls the length of the generated text.

start

string

default:"\\n"

Starting prompt for generation. Can be:

Direct text: "Once upon a time"
Special token: "<|endoftext|>"
File reference: "FILE:prompt.txt" (reads prompt from file)

Sampling control

Temperature

temperature

float

default:"0.8"

Controls randomness in token selection:

1.0 - No change to the model’s predictions
< 1.0 - Less random, more conservative outputs
> 1.0 - More random, more diverse outputs
Approaching 0.0 - Nearly deterministic (always picks most likely token)

Temperature affects the probability distribution over tokens. Lower temperatures make the model more confident and repetitive, while higher temperatures increase diversity but may reduce coherence.

python sample.py \
    --out_dir=out-shakespeare-char \
    --temperature=0.5 \
    --num_samples=3

Top-k sampling

top_k

int

default:"200"

Limits token selection to the top-k most likely tokens. All other tokens have their probability set to zero.

Higher values (e.g., 200, 500) - More diversity
Lower values (e.g., 10, 50) - More focused outputs
None or 0 - No top-k filtering (use full vocabulary)

Top-k sampling prevents the model from selecting very unlikely tokens, which can improve output quality.

python sample.py \
    --out_dir=out-shakespeare-char \
    --top_k=50 \
    --temperature=0.8

Reproducibility

seed

int

default:"1337"

Random seed for reproducible generation. Using the same seed with the same parameters will produce identical outputs.

Set the seed for reproducible results:

python sample.py \
    --seed=42 \
    --out_dir=out-shakespeare-char \
    --num_samples=5

The seed affects both CPU and CUDA random number generators:

torch.manual_seed(seed)
torch.cuda.manual_seed(seed)

Complete example

Here’s how the generation loop works in sample.py:

# encode the beginning of the prompt
if start.startswith('FILE:'):
    with open(start[5:], 'r', encoding='utf-8') as f:
        start = f.read()
start_ids = encode(start)
x = (torch.tensor(start_ids, dtype=torch.long, device=device)[None, ...])

# run generation
with torch.no_grad():
    with ctx:
        for k in range(num_samples):
            y = model.generate(x, max_new_tokens, temperature=temperature, top_k=top_k)
            print(decode(y[0].tolist()))
            print('---------------')

Configuration file

You can override parameters using the configurator system:

Create a config file (e.g., my_sample_config.py):

my_sample_config.py

out_dir = 'out-shakespeare-char'
start = "ROMEO:"
num_samples = 5
max_new_tokens = 300
temperature = 0.7
top_k = 100
seed = 42
device = 'cuda'

Run sampling with the config:

python sample.py config=my_sample_config.py

Command-line arguments will override config file values:

python sample.py config=my_sample_config.py --temperature=0.9 --num_samples=10

Parameter recommendations

For coherent, focused text

python sample.py \
    --temperature=0.6 \
    --top_k=50 \
    --max_new_tokens=500

For creative, diverse text

python sample.py \
    --temperature=1.0 \
    --top_k=300 \
    --max_new_tokens=500

For long-form generation

python sample.py \
    --max_new_tokens=2000 \
    --temperature=0.8 \
    --top_k=200 \
    --num_samples=1

The optimal parameters depend on your model, training data, and use case. Experiment with different combinations to find what works best.

Tokenization

The script automatically handles tokenization based on your model:

Custom datasets

If you trained on a custom dataset with a meta.pkl file:

# Character-level encoding from meta.pkl
stoi, itos = meta['stoi'], meta['itos']
encode = lambda s: [stoi[c] for c in s]
decode = lambda l: ''.join([itos[i] for i in l])

GPT-2 models

For pre-trained GPT-2 models or when no meta.pkl exists:

# BPE encoding using tiktoken
enc = tiktoken.get_encoding("gpt2")
encode = lambda s: enc.encode(s, allowed_special={"<|endoftext|>"})
decode = lambda l: enc.decode(l)

The <|endoftext|> token is a special token in GPT-2 that indicates the end of a document. You can use it as a starting prompt to generate from scratch.

Getting Started

Training

Inference

Configuration

Advanced

Generation parameters

Core parameters

Sampling control

Temperature

Top-k sampling

Reproducibility

Complete example

Configuration file

Parameter recommendations

For coherent, focused text

For creative, diverse text

For long-form generation

Tokenization

Custom datasets

GPT-2 models

Build docs developers (and LLMs) love

Getting Started

Training

Inference

Configuration

Advanced

​Core parameters

​Sampling control

​Temperature

​Top-k sampling

​Reproducibility

​Complete example

​Configuration file

​Parameter recommendations

​For coherent, focused text

​For creative, diverse text

​For long-form generation

​Tokenization

​Custom datasets

​GPT-2 models

Build docs developers (and LLMs) love

Core parameters

Sampling control

Temperature

Top-k sampling

Reproducibility

Complete example

Configuration file

Parameter recommendations

For coherent, focused text

For creative, diverse text

For long-form generation

Tokenization

Custom datasets

GPT-2 models