Model Hub
Jan provides access to a curated collection of open-source AI models.Browsing Models
- Click Models in the sidebar (or navigate to Settings > Models)
- Browse available models by:
- Capability - Text, vision, code
- Size - Parameter count and memory requirements
- Provider - Model architecture (LLaMA, Mistral, etc.)
Model Information
Each model listing shows:- Model name and version
- Parameter size (7B, 13B, 70B, etc.)
- Quantization - Model compression level (Q4, Q5, Q8, etc.)
- Memory requirements - RAM/VRAM needed
- Capabilities - Text, vision, tools, code
- Download size - File size for download
Downloading Models
From the Model Hub
- Find a model you want to use
- Click the Download icon
- Monitor download progress in the notification area
- Once complete, the model appears in your model list
Downloads can be paused and resumed. Jan saves partial downloads if interrupted.
Quantization Levels
Models are available in different quantization levels:| Level | Quality | Speed | Memory | Best For |
|---|---|---|---|---|
| Q4 | Good | Fast | Lowest | Limited resources |
| Q5 | Better | Moderate | Low | Balanced performance |
| Q6 | High | Slower | Medium | Quality on mid-range hardware |
| Q8 | Very High | Slow | High | Maximum quality |
| F16 | Best | Slowest | Highest | Professional use, high-end GPUs |
Vision Models
For image understanding capabilities:- Download a model with vision support
- Jan automatically downloads the required mmproj file
- The model will be available for image input
- Describe images
- Answer questions about visual content
- Extract text from images (OCR)
- Analyze screenshots and diagrams
Importing Models
Import from Hugging Face
- Go to Settings > Models
- Click Import Model
- Enter the Hugging Face model repository URL
- (Optional) Add your Hugging Face token for gated models
- Select the specific model file to import
- Click Import
Import Local Models
- Go to Settings > Models
- Click Import Model
- Select Browse to choose a local GGUF file
- Verify model settings
- Click Import
Configuring Models
Model Settings
Edit model settings to optimize performance:- Go to Settings > Models
- Click on a downloaded model
- Click the Edit icon
- Adjust settings:
Context Length (ctx_len)
Maximum number of tokens the model can process.- Default: 4096-8192 tokens
- Higher values: More context, more memory usage
- Lower values: Less memory, shorter conversations
GPU Layers (ngl)
Number of model layers to run on GPU.- 0 - CPU only (slower but works on any system)
- Max - All layers on GPU (fastest, requires sufficient VRAM)
- Partial - Split between CPU and GPU
Jan auto-detects optimal GPU layer count based on available VRAM. Adjust manually for fine-tuning.
Parallel Processing (n_parallel)
Number of simultaneous sequences to process.- 1 - Single request at a time
- 2-4 - Multiple requests (uses more memory)
CPU Threads
Number of CPU threads for inference.- Auto - Uses system optimal value
- Manual - Set specific thread count
Prompt Template
Defines how messages are formatted for the model.- Most models include default templates
- Custom templates for fine-tuned models
- Follow the model’s instruction format
Runtime Parameters
These can be set per-model or per-conversation:- Temperature - Response randomness (0.0-2.0)
- Max Tokens - Maximum response length
- Top P - Nucleus sampling threshold
- Top K - Token selection limit
- Presence Penalty - Reduces topic repetition
- Frequency Penalty - Reduces phrase repetition
Model Providers
LlamaCpp (Built-in)
Jan’s default engine for running GGUF models locally:- Supports: Most open-source models
- Formats: GGUF
- Features: CPU/GPU acceleration, vision models
- Configuration: Settings > Providers > LlamaCpp
External Providers
Connect to external AI services:- Go to Settings > Providers
- Click Add Provider
- Configure provider settings:
- API endpoint
- API key (if required)
- Model mappings
- Save and test connection
- OpenAI API
- Anthropic Claude
- Custom OpenAI-compatible endpoints
Managing Model Storage
Viewing Storage Usage
- Go to Settings > Advanced
- View Jan Data Folder location
- See total storage used by models
Deleting Models
- Go to Settings > Models
- Find the model to remove
- Click the Delete icon
- Confirm deletion
Changing Model Location
To move models to a different drive:- Go to Settings > Advanced
- Click Change next to Jan Data Folder
- Select new location
- Jan will move existing models (or prompt to restart)
Performance Optimization
System Resources
Monitor resource usage:- CPU - Processing load
- RAM - Memory usage
- GPU - VRAM and utilization
Optimization Tips
Low RAM Systems (<16GB)
Low RAM Systems (<16GB)
- Use Q4 quantized models
- Choose smaller parameter counts (7B or less)
- Reduce context length to 2048-4096
- Disable parallel processing
- Run fewer GPU layers
Mid-Range Systems (16-32GB)
Mid-Range Systems (16-32GB)
- Use Q5 or Q6 models
- Try 7B-13B parameter models
- Context length: 4096-8192
- Enable partial GPU acceleration
- Use 2 parallel sequences if needed
High-End Systems (32GB+)
High-End Systems (32GB+)
- Use Q8 or F16 models
- Run 13B-70B parameter models
- Context length: 8192-32768
- Full GPU acceleration
- Enable continuous batching
GPU Acceleration
For NVIDIA GPUs:- Ensure CUDA drivers are installed
- Jan automatically detects CUDA support
- Set GPU layers in model settings
- Monitor VRAM usage to avoid out-of-memory errors
- Jan uses Metal acceleration automatically
- GPU layers apply to Apple Neural Engine
- Unified memory shared between CPU and GPU
Model Capabilities
Text Generation
All models support basic text generation:- Conversational responses
- Content creation
- Question answering
- Summarization
Vision
Models with vision support can:- Analyze images
- Extract text (OCR)
- Describe visual content
- Answer questions about images
Tool Calling
Models with tool support can:- Search documents (RAG)
- Execute functions
- Use MCP servers
- Call external APIs
Code Generation
Code-specialized models excel at:- Writing functions
- Debugging code
- Code review
- Technical explanations