SamplingParams dataclass configures how tokens are sampled during text generation.
Class Definition
Fields
Controls randomness in sampling:
0.0: Greedy decoding (always pick most likely token)0.0 < temperature < 1.0: Less random1.0: Sample according to model probabilities> 1.0: More random
Limits sampling to the K most likely tokens:
-1: Disabled (consider all tokens)1: Equivalent to greedy decoding> 1: Sample from top K tokens only
Nucleus sampling - samples from smallest set of tokens whose cumulative probability exceeds P:
1.0: Disabled (consider all tokens)0.0 < top_p < 1.0: Sample from top tokens with cumulative probability P
Whether to ignore end-of-sequence tokens:
False: Stop generation when EOS token is sampledTrue: Continue generating even after EOS token
max_tokens.Maximum number of tokens to generate (excluding input prompt).Generation stops when either:
max_tokenstokens have been generated, or- EOS token is sampled (unless
ignore_eos=True)
Properties
is_greedy
True if the configuration will result in greedy (deterministic) decoding.
Conditions:
temperature <= 0.0ortop_k == 1- AND
top_p == 1.0
Usage Examples
Greedy Decoding
Balanced Sampling
Creative Generation
Force Complete Generation
Constrained Sampling
Common Patterns
| Use Case | Temperature | Top-K | Top-P | Notes |
|---|---|---|---|---|
| Factual answers | 0.0 | -1 | 1.0 | Greedy, deterministic |
| Code generation | 0.2 | -1 | 0.95 | Low randomness |
| General chat | 0.7 | -1 | 0.9 | Balanced |
| Creative writing | 0.9 | 50 | 0.95 | High diversity |
| Brainstorming | 1.0 | 100 | 0.98 | Maximum creativity |
Notes
- When both
top_kandtop_pare set, both filters are applied sequentially temperature=0.0is equivalent totop_k=1(but more efficient)- For reproducible results, use
temperature=0.0 - Higher temperature values can lead to nonsensical outputs if too high (>1.5)