How configuration works
nanoGPT uses a minimalist configuration system built around Python’sglobals(). The configurator allows you to override default training parameters through config files and command-line arguments.
Configuration hierarchy
Configuration values are applied in this order:- Default values - Defined in
train.py(lines 32-75) - Config file overrides - Python files in
config/directory - Command-line overrides - Arguments passed with
--key=valuesyntax
Later values override earlier ones. Command-line arguments have the highest priority.
The configurator system
The configurator (defined inconfigurator.py) is a lightweight system that modifies global variables at runtime. It reads command-line arguments and executes config files to override defaults.
How it works
train.py:77 runs the configurator, which:
- Iterates through
sys.argv[1:]arguments - Identifies config files (no
--prefix) and executes them - Parses key-value pairs (with
--prefix) and updates globals - Validates types match the existing global variable types
Using config files
Config files are Python scripts that override default parameters by setting variables.Basic usage
- Print the contents of the config file
- Execute it to override defaults
- Display each override with
Overriding config with...
Example config file
config/train_shakespeare_char.py
Command-line overrides
You can override individual parameters without creating a config file.Syntax
Type validation
The configurator enforces type safety:batch_size="hello", it will fail because the default is an int.
Value parsing
The configurator usesast.literal_eval() to parse values:
- Numbers:
--batch_size=32→32(int) - Booleans:
--compile=False→False(bool) - Strings:
--device=cuda→"cuda"(str) - Floats:
--learning_rate=1e-3→0.001(float)
Combining approaches
You can combine config files and command-line arguments:- Load defaults from
train.py - Apply overrides from
config/train_gpt2.py - Override
batch_sizeto16 - Override
wandb_logtoTrue
Config file examples
Theconfig/ directory contains several example configurations:
Training configs
Training configs
train_shakespeare_char.py- Small character-level model (6 layers, 384 dim)train_gpt2.py- GPT-2 124M on 8x A100 GPUsfinetune_shakespeare.py- Finetune GPT-2 XL on Shakespeare
Evaluation configs
Evaluation configs
eval_gpt2.py- Evaluate GPT-2 base modeleval_gpt2_medium.py- Evaluate GPT-2 medium (350M)eval_gpt2_large.py- Evaluate GPT-2 large (774M)eval_gpt2_xl.py- Evaluate GPT-2 XL (1.5B)
Configuration best practices
Use config files
For experiments with multiple parameter changes, create a dedicated config file instead of long command lines.
Name descriptively
Use descriptive names like
train_small_char.py or finetune_gpt2_shakespeare.py.Document settings
Add comments explaining why you chose specific values, especially for hyperparameters.
Version control
Commit config files to track your experiments and reproduce results.
Next steps
Training parameters
Explore all available training configuration options
Model parameters
Learn about model architecture configuration