Welcome to nanochat
nanochat is the simplest experimental harness for training large language models (LLMs). It’s designed to run on a single GPU node with minimal, hackable code that covers all major LLM stages including tokenization, pretraining, finetuning, evaluation, inference, and a chat UI.Train your own GPT-2 capability LLM (which cost ~72** (3 hours of 8xH100 GPU node) and then talk to it in a familiar ChatGPT-like web UI. On a spot instance, the total cost can be closer to **$20**.
Key Features
One Complexity Dial
Set a single
--depth parameter (number of transformer layers) and all other hyperparameters are calculated automatically in an optimal way.Complete Pipeline
Covers tokenization, pretraining, supervised fine-tuning (SFT), reinforcement learning (RL), evaluation, and chat UI - all in one repo.
Minimal & Hackable
Clean, readable PyTorch code with no giant configuration objects or if-then-else monsters. Designed to be maximally forkable.
Compute Optimal
Automatically trains compute-optimal models at various sizes by sweeping the depth parameter - no manual hyperparameter tuning needed.
The Complete Pipeline
nanochat guides you through the entire journey of creating a ChatGPT-like model:Supervised Fine-Tuning (SFT)
Teach the model conversation patterns, tool use, and multiple choice through supervised learning
Reinforcement Learning (RL)
Further align the model through reinforcement learning techniques (optional)
What You’ll Build
By following the quickstart, you’ll train a GPT-2 grade capability model (4e19 FLOPs) that can:- Write stories and poems
- Answer questions about the world
- Engage in conversational dialogue
- Use tools and execute Python code
- Handle multiple choice questions
Time-to-GPT-2 Leaderboard
nanochat maintains a leaderboard for “GPT-2 speedrun” - the wall-clock time required to train a model to GPT-2 grade capability (DCLM CORE score > 0.256525) on an 8xH100 GPU node:| # | Time | Val BPB | CORE Score | Description |
|---|---|---|---|---|
| 0 | 168 hours | - | 0.2565 | Original OpenAI GPT-2 (2019) |
| 1 | 3.04h | 0.74833 | 0.2585 | d24 baseline, slightly overtrained |
| 2 | 2.91h | 0.74504 | 0.2578 | d26 slightly undertrained + fp8 |
| 3 | 2.76h | 0.74645 | 0.2602 | Bump total batch size to 1M tokens |
Ready to Start?
Quickstart
Train your own GPT-2 in 3 hours and start chatting with it
Community & Support
For questions about the repo:- Use DeepWiki to ask questions about the codebase
- Join the #nanochat channel on Discord
- Check the Discussions tab on GitHub