Overview
GenerationConfig controls how the model generates text, including sampling strategies, length constraints, and decoding methods. These parameters can be set globally on the model or passed per generation request.Loading Configuration
Core Parameters
Temperature
Controls randomness in generation:
0.0 - 0.01: Nearly deterministic (usetop_k=1instead)0.7 - 0.9: Balanced creativity and coherence1.0+: More random and creative
top_p instead of temperature.Top-p (Nucleus Sampling)
Nucleus sampling probability threshold. Model considers only tokens whose cumulative probability exceeds top_p.
0.1: Very conservative, deterministic0.7 - 0.9: Balanced (recommended)0.95 - 1.0: More diverse outputs
Top-k Sampling
Limits sampling to the k most likely tokens:
1: Greedy decoding (deterministic)10-50: Conservative sampling50-100: More diverse sampling
top_k=1 is equivalent to greedy decoding.Length Control
Max Length
Maximum total sequence length (input + output tokens). Generation stops when this limit is reached.
Max New Tokens
Maximum number of tokens to generate (excluding input). Takes precedence over
max_length.Min Length
Minimum total sequence length. Model will not generate EOS token before reaching this length.
Min New Tokens
Minimum number of new tokens to generate
Stopping Criteria
Stop Strings
Stop generation when specific sequences are encountered:EOS Token
Token ID(s) that trigger end of generation. For Qwen:
151643: Default EOS token ID<|im_end|>: ChatML format end token
Pad Token
Token ID used for padding sequences in batched generation. For Qwen, typically set to
tokenizer.eod_id.Repetition Control
Repetition Penalty
Penalty for repeating tokens:
1.0: No penalty1.1 - 1.3: Mild discouragement of repetition> 1.5: Strong penalty (may harm coherence)
No Repeat N-gram Size
Prevent repeating n-grams of this size. Set to
0 to disable.Beam Search
Num Beams
Number of beams for beam search:
1: No beam search (faster)4-10: Beam search (slower but potentially higher quality)
Advanced Parameters
Do Sample
Whether to use sampling (True) or greedy/beam search (False)
Early Stopping
Stop beam search when all beams reach EOS token
Use Cache
Use KV cache for faster generation. Should be
True for inference.