Overview
Model parameters control every aspect of how your AI behaves - from creativity and randomness to memory and processing speed. Understanding these settings helps you get the most out of both local and cloud models.You can adjust parameters per-conversation or set permanent defaults for each model. Changes apply immediately.
Accessing Settings
There are multiple ways to configure model behavior:- Per-Conversation
- Permanent Defaults
- Model Capabilities
Adjust settings for a specific chat:
- Open any conversation
- Click the gear icon next to your selected model
- Modify parameters in the sidebar
- Changes apply to this conversation only
Core Parameters
Temperature
Controls randomness and creativity in responses.| Value | Behavior | Best For |
|---|---|---|
| 0.0 - 0.3 | Deterministic, focused, factual | Code generation, math, factual Q&A |
| 0.4 - 0.7 | Balanced creativity and coherence | General chat, explanations, summaries |
| 0.8 - 1.0 | Creative, varied, exploratory | Creative writing, brainstorming, storytelling |
| 1.0+ | Highly random, experimental | Experimental or artistic purposes |
Top P (Nucleus Sampling)
Controls diversity by limiting the pool of considered tokens.- 0.9 (default): Considers top 90% probability tokens - good balance
- 0.95: Slightly more diverse outputs
- 0.5: Very focused, less variety
- 1.0: Considers all tokens
Top P works together with temperature. Both control randomness, but in different ways. Most users can leave Top P at default (0.9-0.95).
Top K
Limits the model to choosing from the K most likely next tokens.- 40 (default): Moderate variety
- 20: More focused outputs
- 80-100: More diverse outputs
Top K and Top P serve similar purposes. Many models work well with defaults - adjust only if you notice issues with repetition or randomness.
Max Tokens
Maximum number of tokens the model can generate in a single response.- 512: Short responses (a few paragraphs)
- 2048: Medium responses (most use cases)
- 4096+: Long-form content (essays, code, detailed explanations)
- -1: Unlimited (continues until natural stopping point)
Memory Settings
Context Length (Context Size)
How much conversation history the model remembers.| Tokens | Approximate Words | Best For |
|---|---|---|
| 2048 | ~1,500 words | Short Q&A, limited RAM |
| 4096 | ~3,000 words | Standard conversations |
| 8192 | ~6,000 words | Long discussions (default) |
| 16384+ | ~12,000+ words | Very long conversations, document analysis |
| 32768+ | ~24,000+ words | Entire documents, extensive context |
Jan defaults to 8192 tokens or your model’s maximum (whichever is smaller). This handles most conversations well.
- Longer context = more RAM/VRAM usage
- Each message in history consumes context
- Tools and system prompts also use context
Hardware Acceleration
GPU Layers (ngl)
Controls how many model layers run on your GPU vs CPU.| Setting | Performance | Memory | When to Use |
|---|---|---|---|
| Max (e.g., 40) | Fastest | High VRAM usage | You have a powerful GPU with plenty of VRAM |
| Moderate (20) | Balanced | Medium VRAM | GPU has limited memory |
| 0 | Slowest | No VRAM usage | CPU-only inference |
Apple Silicon (M1/M2/M3/M4): GPU layers are automatically offloaded to the unified memory. Just maximize the setting.
Continuous Batching
Processes multiple requests or tokens simultaneously for better throughput.- Enabled (recommended): Better performance for multiple conversations or tool calls
- Disabled: Simpler processing, slightly lower memory usage
Repetition Control
Repeat Penalty
Reduces the likelihood of repeating the same words or phrases.- 1.0: No penalty (default for some models)
- 1.1 - 1.3: Reduces repetition (recommended for most users)
- 1.5+: Strongly discourages repetition (may affect coherence)
Frequency Penalty
Reduces repetition based on how often tokens appear.- 0: No penalty
- 0.1 - 0.3: Mild discouragement of repeated words
- 0.5+: Strong penalty (can reduce coherence)
Presence Penalty
Encourages the model to explore new topics.- 0: No penalty (sticks to current topic)
- 0.1 - 0.5: Encourages topic variety
- 1.0+: Strongly pushes for new topics (may lose focus)
Advanced Settings
Min P
Minimum probability threshold for token selection.- 0.05 (default): Prevents very unlikely tokens
- Lower values: Allow more unlikely tokens (more creative)
- Higher values: Restrict to likely tokens (more focused)
Most users don’t need to adjust Min P. It works well at default settings.
Stop Sequences
Tokens or strings that tell the model to stop generating.- Model stops when it generates any of these strings
- Useful for controlling output format
- Often used in chat templates
Prompt Template
Defines how the model interprets system messages, user input, and assistant responses.Use Case Presets
- General Chat
- Code Generation
- Creative Writing
- Factual Q&A
- Long Documents
Balanced settings for everyday conversations:
Troubleshooting
Responses Too Repetitive
Responses Too Repetitive
Symptoms: Model keeps repeating words, phrases, or ideasSolutions:
- Increase Temperature to 0.8-1.0
- Increase Repeat Penalty to 1.2-1.3
- Add Frequency Penalty around 0.2
- Try a different model (some are more prone to repetition)
Responses Too Random/Incoherent
Responses Too Random/Incoherent
Symptoms: Output doesn’t make sense or goes off-topicSolutions:
- Lower Temperature to 0.3-0.5
- Reduce Top P to 0.85
- Lower Top K to 30-40
- Check your prompt for clarity
Out of Memory Errors
Out of Memory Errors
Symptoms: Model fails to load or crashes mid-generationSolutions:
- Reduce GPU Layers by 5-10 at a time
- Lower Context Length to 4096 or less
- Close other memory-intensive applications
- Try a smaller model or lower quantization (Q4 instead of Q8)
- Reduce Max Tokens if generating very long responses
Very Slow Responses
Very Slow Responses
Symptoms: Model takes forever to generate textSolutions:
- Increase GPU Layers to maximum (if you have a GPU)
- Verify you’re using the correct backend (CUDA for NVIDIA, Metal for Apple)
- Enable Continuous Batching
- Reduce Context Length if very high (32k+ can be slow)
- Close background applications
- Try a smaller/faster model
Model Won't Use Tools/MCP
Model Won't Use Tools/MCP
Symptoms: Model ignores available toolsSolutions:
- Enable Tools capability in model settings (click edit button)
- Lower Temperature to 0.3-0.5 for more consistent tool calling
- Be explicit: “Use the search tool to find…”
- Try a model known for good tool calling (Jan v1, Claude, GPT-4)
Model Capabilities
Beyond parameters, you can enable special features:Vision
Lets the model analyze images you share.Supported Models: GPT-4o, Claude Opus/Sonnet 4, Gemini Pro, LLaVA (local), Jan v2 VL
Tools (MCP)
Enables external tool calling via Model Context Protocol.Configure MCP Servers
See MCP Integration guide for setting up tools.
Reasoning
Enables step-by-step thinking for complex problems.- Best for: Math, logic puzzles, multi-step reasoning
- Models: o3, o1, Claude Opus, specialized reasoning models
- May increase response time but improves accuracy
Embeddings
Generates vector representations of text for semantic search and RAG.- Enable for: Document search, semantic similarity, RAG applications
- Not needed for regular chat
- Specialized models (e.g.,
bge-large-en) work best
Best Practices
Change One Thing at a Time
When tuning, adjust one parameter at a time so you understand its impact.
Next Steps
Local Models
Learn which models work best for different parameter configurations
MCP Integration
Enhance models with external tools and data sources
API Server
Use parameter-tuned models via API
Cloud Integration
Apply parameters to cloud models like GPT-4 and Claude