Installation

System requirements

Hardware requirements

Minimum (smoke test)

CPU: Any modern x86-64 or ARM64 processor
RAM: 8GB
Storage: 10GB free space
GPU: Not required (CPU-only mode available)

Recommended (full training)

CPU: 8+ core modern processor
RAM: 16GB+
Storage: 50GB+ SSD
GPU: NVIDIA RTX 3060 (12GB VRAM) or better

The 253M parameter model was successfully trained on an RTX 3060 (12GB VRAM) using the included configuration. Larger GPUs like RTX 4090 or A100 allow for bigger models and faster training.

Software requirements

Operating System: Linux, macOS, or Windows
Python: 3.9 or higher (3.10+ recommended)
CUDA: 11.8 or higher (for GPU support)
Git: For cloning the repository

Python 3.8 and earlier are not supported due to dependencies on modern type hints and dataclass features.

Installation methods

From source (recommended)
With conda
Docker (experimental)

Clone and install from source

This is the recommended installation method for development and customization.

Clone the repository

git clone https://github.com/yourusername/modern_llm.git
cd modern_llm

Create virtual environment

python3 -m venv .venv
source .venv/bin/activate

Using a virtual environment isolates dependencies and prevents conflicts with system packages.

Install dependencies

Install all required packages:

pip install --upgrade pip
pip install -r requirements.txt

This installs:

PyTorch 2.3.0+ with CUDA support
Transformers 4.44.0+ for tokenizers and models
Datasets 3.0.0+ for data loading
Training utilities (accelerate, PEFT, TRL)
Evaluation tools (evaluate, rouge-score)
Scientific computing (numpy, pandas, scikit-learn)
Visualization (matplotlib, seaborn)

Install package in development mode

Install the package in editable mode for development:

pip install -e .

This allows you to modify the source code without reinstalling.

Using conda/mamba

If you prefer conda for package management:

Create conda environment

conda create -n modern_llm python=3.10
conda activate modern_llm

Install PyTorch with CUDA

Install PyTorch with CUDA support from conda-forge:

conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia

Adjust the CUDA version (11.8, 12.1, etc.) to match your system’s CUDA installation.

Install remaining dependencies

pip install -r requirements.txt

Using Docker

Docker support is experimental. The recommended method is installing from source.

Build the Docker image:

docker build -t modern_llm .
docker run --gpus all -it modern_llm

Verify installation

After installation, verify that everything is set up correctly:

Check Python version

python --version

Should output Python 3.9 or higher.

Check PyTorch installation

python -c "import torch; print(f'PyTorch {torch.__version__}')"

Expected output:

PyTorch 2.3.0+cu118

Check CUDA availability

python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
python -c "import torch; print(f'CUDA version: {torch.version.cuda}')"
python -c "import torch; print(f'GPU count: {torch.cuda.device_count()}')"

For GPU systems, you should see:

CUDA available: True
CUDA version: 11.8
GPU count: 1

If CUDA is not available but you have an NVIDIA GPU, you may need to install or update your CUDA drivers.

Check package imports

Verify all key modules import correctly:

python -c "from modern_llm.models import ModernDecoderLM; print('✓ Models')"
python -c "from modern_llm.config import ModernLLMConfig; print('✓ Config')"
python -c "from modern_llm.training import run_training; print('✓ Training')"
python -c "from transformers import AutoTokenizer; print('✓ Transformers')"
python -c "from datasets import load_dataset; print('✓ Datasets')"

All checks should print a green checkmark.

Run setup verification script

Use the included script to perform comprehensive checks:

python scripts/setup_check.py

This script verifies:

Python version
PyTorch installation and version
CUDA availability and version
All required packages
GPU memory availability
Model initialization
Dataset loading

=== Modern LLM Setup Check ===

✓ Python 3.10.12
✓ PyTorch 2.3.0+cu118  
✓ CUDA available: True
✓ CUDA version: 11.8
✓ GPU: NVIDIA GeForce RTX 3060 (12GB)
✓ All required packages installed
✓ Model initialization OK
✓ Dataset loading OK

=== Setup Complete ===

Configuration

Download datasets

The first time you run training or evaluation, datasets will be automatically downloaded from Hugging Face:

WikiText-103 (~200MB) - Pretraining corpus
TinyStories (~500MB) - Pretraining corpus
Alpaca (~50MB) - Instruction tuning dataset
HH-RLHF (~500MB) - Preference alignment dataset
GSM8K (~10MB) - Math reasoning benchmark

To pre-download datasets:

python scripts/verify_datasets.py

Datasets are cached in ~/.cache/huggingface/datasets by default. Set HF_DATASETS_CACHE environment variable to use a different location.

Environment variables

Optional environment variables for customization:

# Cache directory for Hugging Face datasets
export HF_DATASETS_CACHE="/path/to/datasets"

# Cache directory for Hugging Face models
export HF_HOME="/path/to/models"

# Number of threads for PyTorch
export OMP_NUM_THREADS=8

# CUDA device (if multiple GPUs)
export CUDA_VISIBLE_DEVICES=0

Hardware-specific configurations

The repository includes optimized configurations for different hardware:

RTX 3060 (12GB)

# Use the included RTX 3060 config
python scripts/run_pipeline.py \
  --config configs/lm_max_rtx3060.json \
  --stage pretrain

Configuration:

d_model: 768
n_layers: 12
batch_size: 64 (with gradient accumulation)
micro_batch_size: 2
mixed_precision: bf16

High-end GPU (24GB+)

# Use the GPU preset
python scripts/run_pipeline.py \
  --config gpu \
  --stage pretrain

Configuration:

d_model: 1024
n_layers: 24
batch_size: 128
micro_batch_size: 8
mixed_precision: bf16

Troubleshooting

CUDA version mismatch

If you see CUDA version errors:

RuntimeError: CUDA version mismatch

Solution: Reinstall PyTorch matching your CUDA version:

# For CUDA 11.8
pip install torch --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121

Out of memory during installation

If pip runs out of memory during installation:

# Install packages one at a time
pip install torch
pip install transformers
pip install datasets
# ... etc

Or use --no-cache-dir:

pip install --no-cache-dir -r requirements.txt

Permission denied errors

If you see permission errors when installing:

# Use --user flag (not recommended with venv)
pip install --user -r requirements.txt

# Or fix venv permissions
sudo chown -R $USER:$USER .venv

Using --user with a virtual environment can cause conflicts. Prefer fixing permissions.

Slow dataset downloads

If dataset downloads are very slow:

Use a different Hugging Face mirror:

export HF_ENDPOINT="https://hf-mirror.com"

Download datasets manually:

from datasets import load_dataset
load_dataset("wikitext", "wikitext-103-raw-v1")
load_dataset("roneneldan/TinyStories")
load_dataset("tatsu-lab/alpaca")
load_dataset("Anthropic/hh-rlhf")
load_dataset("gsm8k", "main")

Import errors after installation

If imports fail even after installation:

Verify virtual environment is activated:

which python  # Should point to .venv/bin/python

Reinstall in development mode:
```
pip install -e .
```

Check Python path:

python -c "import sys; print('\n'.join(sys.path))"

Next steps

Once installation is complete:

Run quick start

Try the 5-minute smoke test to verify everything works

Train a model

Start training your first model with the pipeline

Explore architecture

Learn about the model architecture and components

Configuration guide

Customize model size and training parameters

Package dependencies

Complete list of dependencies from requirements.txt:

Core dependencies

torch>=2.3.0
torchvision>=0.18.0
torchaudio>=2.3.0
transformers>=4.44.0
datasets>=3.0.0
accelerate>=1.0.0

Training utilities

peft>=0.12.0         # Parameter-efficient fine-tuning
trl>=0.9.0           # Transformer Reinforcement Learning
sentencepiece>=0.2.0 # Tokenization

Evaluation and metrics

evaluate>=0.4.2
rouge-score>=0.1.2
scikit-learn>=1.4.0

Scientific computing

numpy>=1.26.0
pandas>=2.2.0
tqdm>=4.66.0

Visualization and utilities

matplotlib>=3.8.0
seaborn>=0.13.0
pyyaml>=6.0.0
pytest>=8.0.0

Get Started

Architecture

Training Pipeline

Guides

System requirements

Hardware requirements

Minimum (smoke test)

Recommended (full training)

Software requirements

Installation methods

Clone and install from source

Using conda/mamba

Using Docker

Verify installation

Check Python version

Check PyTorch installation

Check CUDA availability

Check package imports

Run setup verification script

Configuration

Download datasets

Environment variables

Hardware-specific configurations

RTX 3060 (12GB)

High-end GPU (24GB+)

Troubleshooting

Next steps

Run quick start

Train a model

Explore architecture

Configuration guide

Package dependencies

Build docs developers (and LLMs) love

Get Started

Architecture

Training Pipeline

Guides

​System requirements

​Hardware requirements

Minimum (smoke test)

Recommended (full training)

​Software requirements

​Installation methods

​Clone and install from source

​Using conda/mamba

​Using Docker

​Verify installation

​Check Python version

​Check PyTorch installation

​Check CUDA availability

​Check package imports

​Run setup verification script

​Configuration

​Download datasets

​Environment variables

​Hardware-specific configurations

RTX 3060 (12GB)

High-end GPU (24GB+)

​Troubleshooting

​Next steps

Run quick start

Train a model

Explore architecture

Configuration guide

​Package dependencies

Build docs developers (and LLMs) love

System requirements

Hardware requirements

Software requirements

Installation methods

Clone and install from source

Using conda/mamba

Using Docker

Verify installation

Check Python version

Check PyTorch installation

Check CUDA availability

Check package imports

Run setup verification script

Configuration

Download datasets

Environment variables

Hardware-specific configurations

Troubleshooting

Next steps

Package dependencies