sample.py script allows you to generate text from either pre-trained GPT-2 models or from models you’ve trained yourself. It supports various configuration options for controlling the sampling process.
Basic usage
Generate samples from a trained model by pointing to the output directory:Sample from pre-trained GPT-2 models
You can sample from any of OpenAI’s pre-trained GPT-2 models:gpt2(124M parameters)gpt2-medium(350M parameters)gpt2-large(774M parameters)gpt2-xl(1558M parameters)
Input prompts
Text prompt
Provide a starting prompt directly:File-based prompt
Load the prompt from a text file:prompt.txt and use it as the starting prompt.
Model initialization
Model initialization method:
resume- Load from a checkpoint inout_dirgpt2,gpt2-medium,gpt2-large,gpt2-xl- Load a pre-trained GPT-2 model
Directory containing the model checkpoint (
ckpt.pt). Only used when init_from='resume'.Example outputs
Character-level Shakespeare model
After training on Shakespeare for ~3 minutes on a GPU:CPU-trained Shakespeare model
With a smaller model trained on CPU for ~3 minutes:Fine-tuned GPT-2 on Shakespeare
After fine-tuning a pre-trained GPT-2 model:The quality and style of generated text depends heavily on the training data, model size, training duration, and generation parameters.
Hardware configuration
Device to run inference on. Options:
cuda- Use the default GPUcuda:0,cuda:1, etc. - Use a specific GPUcpu- Use CPUmps- Use Apple Silicon GPU (Metal Performance Shaders)
Data type for model weights:
float32- Full precisionfloat16- Half precisionbfloat16- Brain floating point (if supported)
bfloat16 if CUDA is available and supports it, otherwise float16.Enable PyTorch 2.0 compilation for faster inference. Requires PyTorch 2.0 or later.
Performance optimization
Using PyTorch 2.0 compile
Enable model compilation for faster generation:GPU acceleration
For Apple Silicon Macs with recent PyTorch versions:Mixed precision
The script automatically selects the best precision format for your hardware. It enables TF32 acceleration on supported GPUs:On CPU, use
--device=cpu --compile=False to avoid compilation overhead.