Prerequisites
Before installing nanoGPT, ensure you have:- Python 3.8+ installed on your system
- Git for cloning the repository
- (Optional) CUDA-capable GPU for faster training
Clone the repository
First, clone the nanoGPT repository from GitHub:Install dependencies
Install all required Python packages using pip:Dependency details
pytorch - Deep learning framework
pytorch - Deep learning framework
The core framework for building and training neural networks. nanoGPT uses PyTorch for all model operations, automatic differentiation, and GPU acceleration.For GPU support, install the CUDA-enabled version from pytorch.org.
numpy - Numerical computing
numpy - Numerical computing
Used for efficient array operations, particularly for data loading with memory-mapped files (
np.memmap).transformers - Hugging Face library
transformers - Hugging Face library
Required to load pretrained GPT-2 checkpoints from OpenAI. The
GPT.from_pretrained() method uses this library to download and convert GPT-2 weights.datasets - Dataset utilities
datasets - Dataset utilities
Needed if you want to download and preprocess OpenWebText or other datasets from Hugging Face’s dataset hub.
tiktoken - Fast BPE tokenization
tiktoken - Fast BPE tokenization
OpenAI’s fast tokenizer implementation for GPT-2’s Byte Pair Encoding. Used when finetuning with GPT-2 tokenization.
wandb - Experiment tracking
wandb - Experiment tracking
Optional logging tool for tracking training metrics. Can be disabled by setting
wandb_log=False in config files.tqdm - Progress bars
tqdm - Progress bars
Provides visual progress indicators for data preparation scripts.
Platform-specific setup
- Linux / Windows with GPU
- macOS (Apple Silicon)
- CPU only
For GPU acceleration, install PyTorch with CUDA support:Then install the remaining dependencies:
nanoGPT uses PyTorch 2.0+ features like
torch.compile() for performance. Make sure you have PyTorch 2.0 or later installed.Verify installation
Verify that PyTorch is installed correctly and can detect your GPU (if available):Optional: Set up Weights & Biases
If you want to track training metrics with Weights & Biases:Create a W&B account
Sign up at wandb.ai if you don’t have an account.
Weights & Biases is completely optional. You can train models without it by keeping
wandb_log=False (the default).Troubleshooting
torch.compile() errors on Windows
torch.compile() errors on Windows
PyTorch 2.0’s This will slow down training but ensure compatibility.
torch.compile() is not fully supported on Windows. Disable it with:Out of memory errors
Out of memory errors
If you encounter CUDA out of memory errors, try:
- Reducing
batch_size - Reducing
block_size(context length) - Using a smaller model (fewer
n_layer, smallern_embd) - Enabling gradient accumulation with
gradient_accumulation_steps
Slow training on GPU
Slow training on GPU
Ensure that:
- PyTorch 2.0+ is installed
torch.compile()is enabled (default)- You’re using mixed precision (bfloat16 or float16)
- Flash Attention is available (PyTorch 2.0+)
Next steps
Quickstart
Ready to train your first model? Follow the quickstart guide to train a GPT on Shakespeare in minutes.