Core Configuration
| Variable | Default | Description |
|---|---|---|
NRVNA_WORKERS | 4 | Number of worker threads for parallel inference |
NRVNA_LOG_LEVEL | info | Log verbosity level (see Logging) |
NRVNA_MODELS_DIR | ./models/ | Directory path to search for model files |
Model Parameters
See Model Parameters for detailed explanations.| Variable | Default | Description |
|---|---|---|
NRVNA_GPU_LAYERS | 99 (Mac) / 0 (other) | Number of model layers to offload to GPU |
NRVNA_PREDICT | 2048 | Maximum number of tokens to generate |
NRVNA_MAX_CTX | 8192 | Context window size (max tokens model can process) |
NRVNA_TEMP | 0.8 | Sampling temperature (0.0 - 2.0) |
NRVNA_TOP_K | 40 | Top-K sampling parameter |
NRVNA_TOP_P | 0.9 | Top-P (nucleus) sampling parameter |
NRVNA_MIN_P | 0.05 | Minimum probability threshold |
NRVNA_REPEAT_PENALTY | 1.1 | Penalty for repeating tokens |
NRVNA_REPEAT_LAST_N | 64 | Number of previous tokens to consider for repeat penalty |
NRVNA_SEED | 0 | Random seed for reproducibility (0 = random) |
NRVNA_BATCH | 2048 | Batch size for token processing |
llama.cpp Logging
| Variable | Default | Description |
|---|---|---|
LLAMA_LOG_LEVEL | - | llama.cpp log level: error, warn, info, debug |