Model Parameters

Control how the model generates text by adjusting these parameters via environment variables.

Context and Generation

NRVNA_MAX_CTX

Default: 8192 The context window size - maximum number of tokens the model can process at once. This includes both the input prompt and generated output.

Larger values allow longer prompts and conversations
Limited by model architecture and available memory
Common values: 2048, 4096, 8192, 16384, 32768

export NRVNA_MAX_CTX=16384  # For long-context models

NRVNA_PREDICT

Default: 2048 Maximum number of tokens to generate in the response. Acts as a hard limit on output length.

export NRVNA_PREDICT=4096  # Allow longer responses

NRVNA_BATCH

Default: 2048 Batch size for token processing. This controls how many tokens are processed together during inference.

Higher values may improve throughput on GPUs
Limited by available memory
Typically matches context size for optimal performance

export NRVNA_BATCH=512  # Smaller batches for memory-constrained systems

GPU Configuration

NRVNA_GPU_LAYERS

Default: 99 (macOS) / 0 (Linux/Windows) Number of model layers to offload to GPU for acceleration. Higher values use more VRAM but increase inference speed.

0 = CPU-only inference
99 = attempt to offload all layers (typical for full GPU usage)
Adjust based on available VRAM

export NRVNA_GPU_LAYERS=35  # Partial GPU offloading

Sampling Parameters

These control randomness and diversity in the generated text.

NRVNA_TEMP

Default: 0.8 Sampling temperature controls randomness:

0.0 = deterministic, always picks most likely token
0.1-0.5 = focused, coherent output
0.6-0.9 = balanced creativity and coherence
1.0+ = highly random, creative but potentially incoherent

export NRVNA_TEMP=0.7  # Slightly more focused

NRVNA_TOP_K

Default: 40 Limits sampling to the top K most likely tokens. Lower values make output more focused.

1 = always pick most likely (deterministic if temp=0)
20-50 = balanced
Higher values increase diversity

export NRVNA_TOP_K=20  # More focused sampling

NRVNA_TOP_P

Default: 0.9 Nucleus sampling - considers tokens whose cumulative probability exceeds this threshold.

0.5 = very focused
0.9 = balanced (common default)
1.0 = consider all tokens

export NRVNA_TOP_P=0.95  # Slightly more diverse

NRVNA_MIN_P

Default: 0.05 Minimum probability threshold. Tokens with probability below this are excluded from sampling.

0.0 = no minimum
0.05 = exclude very unlikely tokens
Higher values increase focus

export NRVNA_MIN_P=0.1  # More aggressive filtering

NRVNA_REPEAT_PENALTY

Default: 1.1 Penalty applied to tokens that were recently generated, reducing repetition.

1.0 = no penalty
1.1-1.2 = mild penalty (typical)
Higher values strongly discourage repetition

export NRVNA_REPEAT_PENALTY=1.15  # Stronger anti-repetition

NRVNA_REPEAT_LAST_N

Default: 64 Number of previous tokens to consider when applying the repeat penalty. The model looks back this many tokens to detect repetition.

Lower values = shorter repetition detection window
Higher values = detect repetition further back in the output

export NRVNA_REPEAT_LAST_N=128  # Detect repetition further back

NRVNA_SEED

Default: 0 Random seed for reproducible generation.

0 = random seed each time (non-deterministic)
Any other value = fixed seed for reproducibility

export NRVNA_SEED=42  # Reproducible outputs

Sampling Presets

Creative Writing

export NRVNA_TEMP=0.9
export NRVNA_TOP_K=50
export NRVNA_TOP_P=0.95
export NRVNA_MIN_P=0.03

Factual/Technical

export NRVNA_TEMP=0.3
export NRVNA_TOP_K=20
export NRVNA_TOP_P=0.85
export NRVNA_MIN_P=0.1

Balanced (Default)

export NRVNA_TEMP=0.8
export NRVNA_TOP_K=40
export NRVNA_TOP_P=0.9
export NRVNA_MIN_P=0.05

Get Started

Core Concepts

CLI Tools

Guides

Configuration

Context and Generation

NRVNA_MAX_CTX

NRVNA_PREDICT

NRVNA_BATCH

GPU Configuration

NRVNA_GPU_LAYERS

Sampling Parameters

NRVNA_TEMP

NRVNA_TOP_K

NRVNA_TOP_P

NRVNA_MIN_P

NRVNA_REPEAT_PENALTY

NRVNA_REPEAT_LAST_N

NRVNA_SEED

Sampling Presets

Creative Writing

Factual/Technical

Balanced (Default)

Build docs developers (and LLMs) love

Get Started

Core Concepts

CLI Tools

Guides

Configuration

​Context and Generation

​NRVNA_MAX_CTX

​NRVNA_PREDICT

​NRVNA_BATCH

​GPU Configuration

​NRVNA_GPU_LAYERS

​Sampling Parameters

​NRVNA_TEMP

​NRVNA_TOP_K

​NRVNA_TOP_P

​NRVNA_MIN_P

​NRVNA_REPEAT_PENALTY

​NRVNA_REPEAT_LAST_N

​NRVNA_SEED

​Sampling Presets

​Creative Writing

​Factual/Technical

​Balanced (Default)

Build docs developers (and LLMs) love

Context and Generation

NRVNA_MAX_CTX

NRVNA_PREDICT

NRVNA_BATCH

GPU Configuration

NRVNA_GPU_LAYERS

Sampling Parameters

NRVNA_TEMP

NRVNA_TOP_K

NRVNA_TOP_P

NRVNA_MIN_P

NRVNA_REPEAT_PENALTY

NRVNA_REPEAT_LAST_N

NRVNA_SEED

Sampling Presets

Creative Writing

Factual/Technical

Balanced (Default)