Managing Models

Model Hub

Jan provides access to a curated collection of open-source AI models.

Browsing Models

Click Models in the sidebar (or navigate to Settings > Models)
Browse available models by:
- Capability - Text, vision, code
- Size - Parameter count and memory requirements
- Provider - Model architecture (LLaMA, Mistral, etc.)

Model Information

Each model listing shows:

Model name and version
Parameter size (7B, 13B, 70B, etc.)
Quantization - Model compression level (Q4, Q5, Q8, etc.)
Memory requirements - RAM/VRAM needed
Capabilities - Text, vision, tools, code
Download size - File size for download

Downloading Models

From the Model Hub

Find a model you want to use
Click the Download icon
Monitor download progress in the notification area
Once complete, the model appears in your model list

Downloads can be paused and resumed. Jan saves partial downloads if interrupted.

Quantization Levels

Models are available in different quantization levels:

Level	Quality	Speed	Memory	Best For
Q4	Good	Fast	Lowest	Limited resources
Q5	Better	Moderate	Low	Balanced performance
Q6	High	Slower	Medium	Quality on mid-range hardware
Q8	Very High	Slow	High	Maximum quality
F16	Best	Slowest	Highest	Professional use, high-end GPUs

Start with Q4 or Q5 models to test performance, then upgrade to higher quantization if your system handles it well.

Vision Models

For image understanding capabilities:

Download a model with vision support
Jan automatically downloads the required mmproj file
The model will be available for image input

Vision models can:

Describe images
Answer questions about visual content
Extract text from images (OCR)
Analyze screenshots and diagrams

Importing Models

Import from Hugging Face

Go to Settings > Models
Click Import Model
Enter the Hugging Face model repository URL
(Optional) Add your Hugging Face token for gated models
Select the specific model file to import
Click Import

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF

Import Local Models

Go to Settings > Models
Click Import Model
Select Browse to choose a local GGUF file
Verify model settings
Click Import

Only GGUF format models are supported. Other formats (safetensors, PyTorch) must be converted first.

Configuring Models

Model Settings

Edit model settings to optimize performance:

Go to Settings > Models
Click on a downloaded model
Click the Edit icon
Adjust settings:

Context Length (ctx_len)

Maximum number of tokens the model can process.

Default: 4096-8192 tokens
Higher values: More context, more memory usage
Lower values: Less memory, shorter conversations

Jan will suggest increasing context length automatically if you exceed the limit during a conversation.

GPU Layers (ngl)

Number of model layers to run on GPU.

0 - CPU only (slower but works on any system)
Max - All layers on GPU (fastest, requires sufficient VRAM)
Partial - Split between CPU and GPU

Jan auto-detects optimal GPU layer count based on available VRAM. Adjust manually for fine-tuning.

Parallel Processing (n_parallel)

Number of simultaneous sequences to process.

1 - Single request at a time
2-4 - Multiple requests (uses more memory)

CPU Threads

Number of CPU threads for inference.

Auto - Uses system optimal value
Manual - Set specific thread count

Prompt Template

Defines how messages are formatted for the model.

Most models include default templates
Custom templates for fine-tuned models
Follow the model’s instruction format

Runtime Parameters

These can be set per-model or per-conversation:

Temperature - Response randomness (0.0-2.0)
Max Tokens - Maximum response length
Top P - Nucleus sampling threshold
Top K - Token selection limit
Presence Penalty - Reduces topic repetition
Frequency Penalty - Reduces phrase repetition

Model Providers

LlamaCpp (Built-in)

Jan’s default engine for running GGUF models locally:

Supports: Most open-source models
Formats: GGUF
Features: CPU/GPU acceleration, vision models
Configuration: Settings > Providers > LlamaCpp

External Providers

Connect to external AI services:

Go to Settings > Providers
Click Add Provider
Configure provider settings:
- API endpoint
- API key (if required)
- Model mappings
Save and test connection

Supported external providers:

OpenAI API
Anthropic Claude
Custom OpenAI-compatible endpoints

Managing Model Storage

Viewing Storage Usage

Go to Settings > Advanced
View Jan Data Folder location
See total storage used by models

Deleting Models

Go to Settings > Models
Find the model to remove
Click the Delete icon
Confirm deletion

Deleting a model removes all downloaded files. You’ll need to re-download if you want to use it again.

Changing Model Location

To move models to a different drive:

Go to Settings > Advanced
Click Change next to Jan Data Folder
Select new location
Jan will move existing models (or prompt to restart)

Performance Optimization

System Resources

Monitor resource usage:

CPU - Processing load
RAM - Memory usage
GPU - VRAM and utilization

View real-time stats in the System Monitor (top-right corner).

Optimization Tips

Low RAM Systems (<16GB)

Use Q4 quantized models
Choose smaller parameter counts (7B or less)
Reduce context length to 2048-4096
Disable parallel processing
Run fewer GPU layers

Mid-Range Systems (16-32GB)

Use Q5 or Q6 models
Try 7B-13B parameter models
Context length: 4096-8192
Enable partial GPU acceleration
Use 2 parallel sequences if needed

High-End Systems (32GB+)

Use Q8 or F16 models
Run 13B-70B parameter models
Context length: 8192-32768
Full GPU acceleration
Enable continuous batching

GPU Acceleration

For NVIDIA GPUs:

Ensure CUDA drivers are installed
Jan automatically detects CUDA support
Set GPU layers in model settings
Monitor VRAM usage to avoid out-of-memory errors

For Apple Silicon (M1/M2/M3):

Jan uses Metal acceleration automatically
GPU layers apply to Apple Neural Engine
Unified memory shared between CPU and GPU

Model Capabilities

Text Generation

All models support basic text generation:

Conversational responses
Content creation
Question answering
Summarization

Vision

Models with vision support can:

Analyze images
Extract text (OCR)
Describe visual content
Answer questions about images

Requires downloading mmproj weights.

Tool Calling

Models with tool support can:

Search documents (RAG)
Execute functions
Use MCP servers
Call external APIs

Requires explicit tool definitions.

Code Generation

Code-specialized models excel at:

Writing functions
Debugging code
Code review
Technical explanations

Use models with “Code” or “Coder” in their name for programming tasks.

Get Started

Desktop App

Features

Integrations

Managing Models

Model Hub

Browsing Models

Model Information

Downloading Models

From the Model Hub

Quantization Levels

Vision Models

Importing Models

Import from Hugging Face

Import Local Models

Configuring Models

Model Settings

Context Length (ctx_len)

GPU Layers (ngl)

Parallel Processing (n_parallel)

CPU Threads

Prompt Template

Runtime Parameters

Model Providers

LlamaCpp (Built-in)

External Providers

Managing Model Storage

Viewing Storage Usage

Deleting Models

Changing Model Location

Performance Optimization

System Resources

Optimization Tips

GPU Acceleration

Model Capabilities

Text Generation

Vision

Tool Calling

Code Generation

Build docs developers (and LLMs) love

Get Started

Desktop App

Features

Integrations

​Model Hub

​Browsing Models

​Model Information

​Downloading Models

​From the Model Hub

​Quantization Levels

​Vision Models

​Importing Models

​Import from Hugging Face

​Import Local Models

​Configuring Models

​Model Settings

​Context Length (ctx_len)

​GPU Layers (ngl)

​Parallel Processing (n_parallel)

​CPU Threads

​Prompt Template

​Runtime Parameters

​Model Providers

​LlamaCpp (Built-in)

​External Providers

​Managing Model Storage

​Viewing Storage Usage

​Deleting Models

​Changing Model Location

​Performance Optimization

​System Resources

​Optimization Tips

​GPU Acceleration

​Model Capabilities

​Text Generation

​Vision

​Tool Calling

​Code Generation

Build docs developers (and LLMs) love

Model Hub

Browsing Models

Model Information

Downloading Models

From the Model Hub

Quantization Levels

Vision Models

Importing Models

Import from Hugging Face

Import Local Models

Configuring Models

Model Settings

Context Length (ctx_len)

GPU Layers (ngl)

Parallel Processing (n_parallel)

CPU Threads

Prompt Template

Runtime Parameters

Model Providers

LlamaCpp (Built-in)

External Providers

Managing Model Storage

Viewing Storage Usage

Deleting Models

Changing Model Location

Performance Optimization

System Resources

Optimization Tips

GPU Acceleration

Model Capabilities

Text Generation

Vision

Tool Calling

Code Generation