Sampling

The sample.py script allows you to generate text from either pre-trained GPT-2 models or from models you’ve trained yourself. It supports various configuration options for controlling the sampling process.

Basic usage

Generate samples from a trained model by pointing to the output directory:

python sample.py --out_dir=out-shakespeare-char

This will generate text using the default parameters and print the results to the console.

Sample from pre-trained GPT-2 models

You can sample from any of OpenAI’s pre-trained GPT-2 models:

python sample.py \
    --init_from=gpt2-xl \
    --start="What is the answer to life, the universe, and everything?" \
    --num_samples=5 --max_new_tokens=100

Available GPT-2 variants:

gpt2 (124M parameters)
gpt2-medium (350M parameters)
gpt2-large (774M parameters)
gpt2-xl (1558M parameters)

Input prompts

Text prompt

Provide a starting prompt directly:

python sample.py --start="ANGELO:" --out_dir=out-shakespeare-char

File-based prompt

Load the prompt from a text file:

python sample.py --start=FILE:prompt.txt

The script will read the contents of prompt.txt and use it as the starting prompt.

Model initialization

init_from

string

default:"resume"

Model initialization method:

resume - Load from a checkpoint in out_dir
gpt2, gpt2-medium, gpt2-large, gpt2-xl - Load a pre-trained GPT-2 model

out_dir

string

default:"out"

Directory containing the model checkpoint (ckpt.pt). Only used when init_from='resume'.

Example outputs

Character-level Shakespeare model

After training on Shakespeare for ~3 minutes on a GPU:

ANGELO:
And cowards it be strawn to my bed,
And thrust the gates of my threats,
Because he that ale away, and hang'd
An one with him.

DUKE VINCENTIO:
I thank your eyes against it.

DUKE VINCENTIO:
Then will answer him to save the malm:
And what have you tyrannous shall do this?

DUKE VINCENTIO:
If you have done evils of all disposition
To end his power, the day of thrust for a common men
That I leave, to fight with over-liking
Hasting in a roseman.

CPU-trained Shakespeare model

With a smaller model trained on CPU for ~3 minutes:

GLEORKEN VINGHARD III:
Whell's the couse, the came light gacks,
And the for mought you in Aut fries the not high shee
bot thou the sought bechive in that to doth groan you,
No relving thee post mose the wear

Fine-tuned GPT-2 on Shakespeare

After fine-tuning a pre-trained GPT-2 model:

THEODORE:
Thou shalt sell me to the highest bidder: if I die,
I sell thee to the first; if I go mad,
I sell thee to the second; if I
lie, I sell thee to the third; if I slay,
I sell thee to the fourth: so buy or sell,
I tell thee again, thou shalt not sell my
possession.

JULIET:
And if thou steal, thou shalt not sell thyself.

THEODORE:
I do not steal; I sell the stolen goods.

THEODORE:
Thou know'st not what thou sell'st; thou, a woman,
Thou art ever a victim, a thing of no worth:
Thou hast no right, no right, but to be sold.

The quality and style of generated text depends heavily on the training data, model size, training duration, and generation parameters.

Hardware configuration

device

string

default:"cuda"

Device to run inference on. Options:

cuda - Use the default GPU
cuda:0, cuda:1, etc. - Use a specific GPU
cpu - Use CPU
mps - Use Apple Silicon GPU (Metal Performance Shaders)

dtype

string

default:"auto"

Data type for model weights:

float32 - Full precision
float16 - Half precision
bfloat16 - Brain floating point (if supported)

Default behavior: Uses bfloat16 if CUDA is available and supports it, otherwise float16.

compile

boolean

default:"false"

Enable PyTorch 2.0 compilation for faster inference. Requires PyTorch 2.0 or later.

Performance optimization

Using PyTorch 2.0 compile

Enable model compilation for faster generation:

python sample.py --compile=True --out_dir=out-shakespeare-char

GPU acceleration

For Apple Silicon Macs with recent PyTorch versions:

python sample.py --device=mps --out_dir=out-shakespeare-char

This uses the on-chip GPU and can significantly accelerate sampling (2-3X speedup).

Mixed precision

The script automatically selects the best precision format for your hardware. It enables TF32 acceleration on supported GPUs:

torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True

On CPU, use --device=cpu --compile=False to avoid compilation overhead.

Getting Started

Training

Inference

Configuration

Advanced

Basic usage

Sample from pre-trained GPT-2 models

Input prompts

Text prompt

File-based prompt

Model initialization

Example outputs

Character-level Shakespeare model

CPU-trained Shakespeare model

Fine-tuned GPT-2 on Shakespeare

Hardware configuration

Performance optimization

Using PyTorch 2.0 compile

GPU acceleration

Mixed precision

Build docs developers (and LLMs) love

Getting Started

Training

Inference

Configuration

Advanced

​Basic usage

​Sample from pre-trained GPT-2 models

​Input prompts

​Text prompt

​File-based prompt

​Model initialization

​Example outputs

​Character-level Shakespeare model

​CPU-trained Shakespeare model

​Fine-tuned GPT-2 on Shakespeare

​Hardware configuration

​Performance optimization

​Using PyTorch 2.0 compile

​GPU acceleration

​Mixed precision

Build docs developers (and LLMs) love

Basic usage

Sample from pre-trained GPT-2 models

Input prompts

Text prompt

File-based prompt

Model initialization

Example outputs

Character-level Shakespeare model

CPU-trained Shakespeare model

Fine-tuned GPT-2 on Shakespeare

Hardware configuration

Performance optimization

Using PyTorch 2.0 compile

GPU acceleration

Mixed precision