Overview
olmOCR training is configured using YAML files and theTrainConfig dataclass. This page documents all available configuration options.
Configuration Structure
The configuration is organized into nested sections:Model Configuration
ModelConfig
Controls how the model is loaded and initialized.train/core/config.py
The model name or path to load. Must be compatible with HuggingFace transformers.Examples:
Qwen/Qwen2-VL-7B-Instructallenai/Molmo-7B-O-0924/path/to/local/model
The model architecture type. Options:
causal, vllmPrecision for model weights. Options:
bfloat16, float16, float32Whether to use flash attention for faster training. Requires compatible GPU (Ampere or newer).
Whether to trust remote code when loading models. Set to
true for models requiring custom code.Specific model revision/commit to use from HuggingFace Hub.
Example
LoRA Configuration
LoraConfig
Configures Low-Rank Adaptation for parameter-efficient fine-tuning.train/core/config.py
The rank of the LoRA decomposition. Higher values = more parameters and capacity.Recommended values:
16: Lightweight, good for simple tasks32: Balanced (recommended for most use cases)64: Maximum capacity for complex tasks
LoRA scaling parameter. Typically set equal to
rank.Formula: scaling = alpha / rankDropout probability for LoRA layers. Helps prevent overfitting.
Bias configuration. Options:
none, all, lora_onlyThe task type for PEFT. Use
CAUSAL_LM for language modeling.List of module names to apply LoRA adapters to. Supports regex patterns.
Target Modules Examples
- Qwen2-VL
- Molmo
Set
lora: null or omit the section entirely to perform full fine-tuning (not recommended due to memory requirements).Data Configuration
DataConfig
Configures training and validation datasets.train/core/config.py
Random seed for data shuffling and augmentation.
Local directory to cache downloaded PDFs. Improves data loading speed.Example:
/data/pdf_cacheMetric name for selecting the best checkpoint. Format:
{source_name}_lossExample: validation_data_lossList of data sources to load.
SourceConfig
Configures individual data sources.train/core/config.py
Name identifier for this data source.
Glob pattern for OpenAI batch response JSON files. Supports S3 and local paths.Examples:
s3://bucket/train/*.json/data/responses/*.json
Image resolution(s) to render PDF pages to. Randomly selects from list during training.Examples:
[1024]- Fixed 1024px[768, 1024, 1280]- Random augmentation
Target length(s) for anchor text extraction. Randomly selects from list.Examples:
[6000]- Fixed 6000 characters[4000, 6000, 8000]- Variable length
Example
Hyperparameters
HyperparamConfig
Controls training dynamics.train/core/config.py
Batch size per GPU. For vision models, typically set to
1.Evaluation batch size. Defaults to same as
batch_size.Initial learning rate for the optimizer.Recommended values:
1e-4: Conservative, stable3e-4: More aggressive5e-5: Very conservative
Maximum number of training steps.
-1 trains for full epochs.Number of steps to accumulate gradients. Effective batch size =
batch_size * gradient_accumulation_steps * num_gpusEnable gradient checkpointing to reduce memory at the cost of ~20% slower training.
Number of warmup steps. Mutually exclusive with
warmup_ratio.Fraction of training for warmup. E.g.,
0.03 = 3% warmup.Weight decay coefficient for regularization. Typical:
0.01Maximum gradient norm.
0.0 disables clipping. Typical: 1.0Optimizer to use. Options:
adamw_torch, adamw_hf, sgd, adafactorLearning rate scheduler. Options:
linear, cosine, constant, polynomialLog training metrics every N steps.
Run evaluation every N steps.
For DDP, find unused parameters. Required for Molmo, should be
false for Qwen2-VL.Example
Generation Configuration
GenerateConfig
Controls sequence length and generation parameters.train/core/config.py
Maximum sequence length for training. Affects memory usage significantly.Common values:
4096: Standard for most documents8192: Long documents2048: Memory-constrained settings
Sampling temperature (used during inference, not training).
Example
Save Configuration
SaveConfig
Controls checkpoint saving behavior.train/core/config.py
Output directory for checkpoints. Supports S3 paths.Examples:
s3://bucket/models//data/checkpoints/
Maximum number of checkpoints to keep. Older checkpoints are deleted.
Save checkpoint every N steps. Supports OmegaConf interpolation.
Example
AWS Configuration
AwsConfig
Credentials for S3 access.train/core/config.py
AWS profile name from
~/.aws/credentialsAWS access key ID. Can also be set via
AWS_ACCESS_KEY_ID environment variable.AWS secret access key. Can also be set via
AWS_SECRET_ACCESS_KEY environment variable.Default AWS region.
Example
WandB Configuration
WandbConfig
Weights & Biases experiment tracking.train/core/config.py
WandB team/entity name.
WandB project name.
WandB API key. Can also be set via
WANDB_API_KEY environment variable.Logging mode. Options:
online, offline, disabledExample
Complete Configuration Example
Command-Line Overrides
You can override any configuration value from the command line:Next Steps
Training Overview
Learn about the training pipeline
Qwen2-VL Training
Fine-tune Qwen2-VL models
Molmo Training
Fine-tune Molmo models
Data Preparation
Prepare training datasets