Skip to main content

PipelineConfig

Unified configuration for end-to-end training pipelines: pretrain → SFT → DPO → verifier. Combines model, training, hardware, and data configs into a single JSON-serializable structure for orchestration. Each stage has its own training config, but they share the same model architecture and hardware settings.

Model architecture parameters

vocab_size
int
default:"50257"
Size of the vocabulary.
d_model
int
default:"768"
Hidden dimension of the model.
n_layers
int
default:"12"
Number of transformer layers.
n_heads
int
default:"12"
Number of attention heads.
ffn_hidden_size
int
default:"3072"
Hidden size of the feedforward layer.
max_seq_len
int
default:"1024"
Maximum sequence length.
dropout
float
default:"0.1"
Dropout probability.
use_rope
bool
default:"True"
Whether to use Rotary Position Embeddings.
use_attention_sinks
bool
default:"True"
Whether to use attention sinks.
num_attention_sinks
int
default:"4"
Number of attention sink tokens.
use_swiglu
bool
default:"True"
Whether to use SwiGLU activation.
tie_embeddings
bool
default:"True"
Whether to share input/output embeddings.
use_gqa
bool
default:"False"
Whether to use Grouped Query Attention.
gqa_groups
Optional[int]
default:"None"
Number of groups for GQA.
use_moe
bool
default:"False"
Whether to use Mixture-of-Experts.

Hardware configuration

hardware_preset
str
default:"auto"
Hardware preset: “auto”, “local”, “rtx3060”, “a100”, or “h100”.

Data configuration

data_preset
str
default:"small"
Data scale preset: “small”, “medium”, “large”, or “xl”.
pretrain_datasets
Optional[List[str]]
default:"None"
List of dataset names for pretraining. If None, uses default from data preset.

Pretraining parameters

pretrain_max_steps
int
default:"20000"
Maximum training steps for pretraining.
pretrain_lr
float
default:"3e-4"
Learning rate for pretraining.
pretrain_batch_size
int
default:"64"
Global batch size for pretraining.
pretrain_micro_batch_size
int
default:"2"
Micro batch size for pretraining.
pretrain_warmup_steps
int
default:"500"
Warmup steps for pretraining.

SFT parameters

sft_max_steps
int
default:"5000"
Maximum training steps for supervised fine-tuning.
sft_lr
float
default:"1e-5"
Learning rate for SFT.
sft_batch_size
int
default:"32"
Global batch size for SFT.
sft_micro_batch_size
int
default:"2"
Micro batch size for SFT.
sft_dataset
str
default:"tatsu-lab/alpaca"
Dataset for SFT (used if sft_datasets is None).
sft_datasets
Optional[List[str]]
default:"None"
List of multiple SFT datasets. Overrides sft_dataset if provided.

DPO parameters

dpo_max_steps
int
default:"2000"
Maximum training steps for Direct Preference Optimization.
dpo_lr
float
default:"5e-6"
Learning rate for DPO.
dpo_batch_size
int
default:"16"
Global batch size for DPO.
dpo_micro_batch_size
int
default:"1"
Micro batch size for DPO.
dpo_beta
float
default:"0.1"
Beta parameter for DPO loss.
dpo_dataset
str
default:"Anthropic/hh-rlhf"
Dataset for DPO.

Verifier parameters

verifier_max_steps
int
default:"3000"
Maximum training steps for verifier.
verifier_lr
float
default:"1e-4"
Learning rate for verifier.
verifier_batch_size
int
default:"32"
Global batch size for verifier.
verifier_micro_batch_size
int
default:"4"
Micro batch size for verifier.

Output and logging

output_dir
str
default:"experiments/runs"
Base directory for all outputs.
run_name
str
default:"modern-llm-pipeline"
Base name for the run. Stage suffixes are added automatically.
tokenizer_name
str
default:"gpt2"
Tokenizer to use across all stages.
seed
int
default:"42"
Random seed for reproducibility.
mixed_precision
str
default:"bf16"
Mixed precision dtype: “bf16”, “fp16”, or “fp32”.
eval_every
int
default:"500"
Evaluate every N steps.
save_every
int
default:"2000"
Save checkpoint every N steps.
log_every
int
default:"100"
Log metrics every N steps.

Methods

get_model_config

Build a ModernLLMConfig from pipeline settings.
def get_model_config() -> ModernLLMConfig

get_hardware_config

Get hardware config from preset or auto-detect.
def get_hardware_config() -> HardwareConfig

get_data_config

Get data config from preset.
def get_data_config() -> DataConfig

get_pretrain_config

Build TrainingConfig for pretraining stage.
def get_pretrain_config() -> TrainingConfig

get_sft_config

Build TrainingConfig for SFT stage.
def get_sft_config() -> TrainingConfig

get_dpo_config

Build TrainingConfig for DPO stage.
def get_dpo_config() -> TrainingConfig

get_verifier_config

Build TrainingConfig for verifier training.
def get_verifier_config() -> TrainingConfig

save

Save config to JSON file.
def save(path: Path | str) -> None

load

Load config from JSON file.
@classmethod
def load(cls, path: Path | str) -> PipelineConfig

to_dict

Serialize to dictionary.
def to_dict() -> dict[str, Any]

from_dict

Create config from dictionary.
@classmethod
def from_dict(cls, data: dict[str, Any]) -> PipelineConfig

Preset configurations

local_smoke_config

Minimal config for quick smoke testing on local machine.
from modern_llm.config import local_smoke_config

config = local_smoke_config()
# d_model=256, n_layers=4, 100 pretrain steps
Configuration:
  • d_model: 256
  • n_layers: 4
  • n_heads: 4
  • ffn_hidden_size: 512
  • max_seq_len: 256
  • hardware_preset: “local”
  • data_preset: “small”
  • pretrain_max_steps: 100
  • sft_max_steps: 50
  • dpo_max_steps: 50
  • verifier_max_steps: 50

local_full_config

Full config for RTX 3060 training.
from modern_llm.config import local_full_config

config = local_full_config()
# GPT-2 small architecture, 20K steps
Configuration:
  • d_model: 768
  • n_layers: 12
  • n_heads: 12
  • ffn_hidden_size: 3072
  • max_seq_len: 1024
  • hardware_preset: “local”
  • data_preset: “medium”
  • pretrain_max_steps: 20000
  • sft_max_steps: 5000
  • dpo_max_steps: 2000
  • verifier_max_steps: 3000

gpu_smoke_config

Minimal config for GPU smoke testing.
from modern_llm.config import gpu_smoke_config

config = gpu_smoke_config()
# Same as local_smoke but with auto hardware detection
Configuration:
  • Same as local_smoke_config but with hardware_preset="auto"

gpu_full_config

Full config for high-end GPU training (A100/H100).
from modern_llm.config import gpu_full_config

config = gpu_full_config()
# Optimized for quality with diverse datasets
# Estimated time on H100: ~42 hours
Configuration:
  • d_model: 1024
  • n_layers: 12
  • n_heads: 16
  • ffn_hidden_size: 4096
  • max_seq_len: 1024
  • use_attention_sinks: False (for Flash Attention compatibility)
  • hardware_preset: “auto”
  • data_preset: “large”
  • pretrain_datasets: [“wikitext-103-raw-v1”, “openwebtext”, “wikipedia”, “roneneldan/TinyStories:100000”]
  • pretrain_max_steps: 80000
  • pretrain_batch_size: 128
  • pretrain_micro_batch_size: 32
  • sft_datasets: [“tatsu-lab/alpaca”, “databricks/databricks-dolly-15k”, “Open-Orca/OpenOrca:50000”]
  • sft_max_steps: 10000
  • dpo_max_steps: 3000
  • verifier_max_steps: 3000

get_pipeline_preset

Get a pipeline preset by name.
def get_pipeline_preset(name: str) -> PipelineConfig
Parameters:
  • name: One of “local-smoke”, “local”, “gpu-smoke”, or “gpu”
Example:
from modern_llm.config import get_pipeline_preset

# Get preset by name
config = get_pipeline_preset("gpu")

# Customize it
config.pretrain_max_steps = 100000
config.run_name = "my-custom-run"

# Save for later
config.save("configs/my_config.json")

Complete example

from modern_llm.config import PipelineConfig, get_pipeline_preset

# Create custom pipeline config
config = PipelineConfig(
    # Model architecture
    d_model=1024,
    n_layers=16,
    n_heads=16,
    ffn_hidden_size=4096,
    max_seq_len=2048,
    use_gqa=True,
    gqa_groups=4,
    
    # Hardware and data
    hardware_preset="a100",
    data_preset="large",
    
    # Pretraining
    pretrain_max_steps=50000,
    pretrain_lr=3e-4,
    pretrain_batch_size=128,
    pretrain_datasets=[
        "wikitext-103-raw-v1",
        "openwebtext",
    ],
    
    # SFT with multiple datasets
    sft_max_steps=8000,
    sft_lr=1e-5,
    sft_datasets=[
        "tatsu-lab/alpaca",
        "databricks/databricks-dolly-15k",
    ],
    
    # Output
    output_dir="experiments/my-run",
    run_name="gqa-1b",
)

# Get stage-specific configs
model_cfg = config.get_model_config()
hardware_cfg = config.get_hardware_config()
pretrain_cfg = config.get_pretrain_config()
sft_cfg = config.get_sft_config()
dpo_cfg = config.get_dpo_config()
verifier_cfg = config.get_verifier_config()

# Save and load
config.save("configs/gqa-1b.json")
loaded = PipelineConfig.load("configs/gqa-1b.json")

# Or use a preset as starting point
preset = get_pipeline_preset("gpu")
preset.pretrain_max_steps = 100000
preset.run_name = "extended-pretrain"

Build docs developers (and LLMs) love