Context and Generation
NRVNA_MAX_CTX
Default:8192
The context window size - maximum number of tokens the model can process at once. This includes both the input prompt and generated output.
- Larger values allow longer prompts and conversations
- Limited by model architecture and available memory
- Common values:
2048,4096,8192,16384,32768
NRVNA_PREDICT
Default:2048
Maximum number of tokens to generate in the response. Acts as a hard limit on output length.
NRVNA_BATCH
Default:2048
Batch size for token processing. This controls how many tokens are processed together during inference.
- Higher values may improve throughput on GPUs
- Limited by available memory
- Typically matches context size for optimal performance
GPU Configuration
NRVNA_GPU_LAYERS
Default:99 (macOS) / 0 (Linux/Windows)
Number of model layers to offload to GPU for acceleration. Higher values use more VRAM but increase inference speed.
0= CPU-only inference99= attempt to offload all layers (typical for full GPU usage)- Adjust based on available VRAM
Sampling Parameters
These control randomness and diversity in the generated text.NRVNA_TEMP
Default:0.8
Sampling temperature controls randomness:
0.0= deterministic, always picks most likely token0.1-0.5= focused, coherent output0.6-0.9= balanced creativity and coherence1.0+= highly random, creative but potentially incoherent
NRVNA_TOP_K
Default:40
Limits sampling to the top K most likely tokens. Lower values make output more focused.
1= always pick most likely (deterministic if temp=0)20-50= balanced- Higher values increase diversity
NRVNA_TOP_P
Default:0.9
Nucleus sampling - considers tokens whose cumulative probability exceeds this threshold.
0.5= very focused0.9= balanced (common default)1.0= consider all tokens
NRVNA_MIN_P
Default:0.05
Minimum probability threshold. Tokens with probability below this are excluded from sampling.
0.0= no minimum0.05= exclude very unlikely tokens- Higher values increase focus
NRVNA_REPEAT_PENALTY
Default:1.1
Penalty applied to tokens that were recently generated, reducing repetition.
1.0= no penalty1.1-1.2= mild penalty (typical)- Higher values strongly discourage repetition
NRVNA_REPEAT_LAST_N
Default:64
Number of previous tokens to consider when applying the repeat penalty. The model looks back this many tokens to detect repetition.
- Lower values = shorter repetition detection window
- Higher values = detect repetition further back in the output
NRVNA_SEED
Default:0
Random seed for reproducible generation.
0= random seed each time (non-deterministic)- Any other value = fixed seed for reproducibility