Skip to main content
Get nanoGPT up and running on your system in just a few minutes.

Prerequisites

Before installing nanoGPT, ensure you have:
  • Python 3.8+ installed on your system
  • Git for cloning the repository
  • (Optional) CUDA-capable GPU for faster training

Clone the repository

First, clone the nanoGPT repository from GitHub:
git clone https://github.com/karpathy/nanoGPT.git
cd nanoGPT

Install dependencies

Install all required Python packages using pip:
pip install torch numpy transformers datasets tiktoken wandb tqdm

Dependency details

The core framework for building and training neural networks. nanoGPT uses PyTorch for all model operations, automatic differentiation, and GPU acceleration.For GPU support, install the CUDA-enabled version from pytorch.org.
Used for efficient array operations, particularly for data loading with memory-mapped files (np.memmap).
Required to load pretrained GPT-2 checkpoints from OpenAI. The GPT.from_pretrained() method uses this library to download and convert GPT-2 weights.
Needed if you want to download and preprocess OpenWebText or other datasets from Hugging Face’s dataset hub.
OpenAI’s fast tokenizer implementation for GPT-2’s Byte Pair Encoding. Used when finetuning with GPT-2 tokenization.
Optional logging tool for tracking training metrics. Can be disabled by setting wandb_log=False in config files.
Provides visual progress indicators for data preparation scripts.

Platform-specific setup

For GPU acceleration, install PyTorch with CUDA support:
# For CUDA 11.8
pip install torch --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121
Then install the remaining dependencies:
pip install numpy transformers datasets tiktoken wandb tqdm
nanoGPT uses PyTorch 2.0+ features like torch.compile() for performance. Make sure you have PyTorch 2.0 or later installed.

Verify installation

Verify that PyTorch is installed correctly and can detect your GPU (if available):
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
Expected output with GPU:
PyTorch version: 2.1.0
CUDA available: True
CUDA version: 12.1
GPU: NVIDIA A100-SXM4-40GB

Optional: Set up Weights & Biases

If you want to track training metrics with Weights & Biases:
1

Create a W&B account

Sign up at wandb.ai if you don’t have an account.
2

Log in via CLI

Run the following command and paste your API key when prompted:
wandb login
3

Enable logging in config

Set wandb_log=True in your config file or pass it as a command-line argument:
python train.py config/train_shakespeare_char.py --wandb_log=True
Weights & Biases is completely optional. You can train models without it by keeping wandb_log=False (the default).

Troubleshooting

PyTorch 2.0’s torch.compile() is not fully supported on Windows. Disable it with:
python train.py --compile=False
This will slow down training but ensure compatibility.
If you encounter CUDA out of memory errors, try:
  • Reducing batch_size
  • Reducing block_size (context length)
  • Using a smaller model (fewer n_layer, smaller n_embd)
  • Enabling gradient accumulation with gradient_accumulation_steps
Ensure that:
  • PyTorch 2.0+ is installed
  • torch.compile() is enabled (default)
  • You’re using mixed precision (bfloat16 or float16)
  • Flash Attention is available (PyTorch 2.0+)

Next steps

Quickstart

Ready to train your first model? Follow the quickstart guide to train a GPT on Shakespeare in minutes.

Build docs developers (and LLMs) love