List Models
Lists all models currently available in your Jan instance.Response
Always
list.Array of model objects available in Jan.Each model contains:
id(string): The model identifier that can be referenced in API endpointsobject(string): Always"model"created(number): Unix timestamp of when the model was createdowned_by(string): The organization or author of the modelname(string): Human-readable name of the modelversion(string): Model versionformat(string): Model format (e.g.,"gguf")engine(string): Inference engine (e.g.,"llama.cpp")description(string): Model descriptionsettings(object): Model configuration settingsparameters(object): Runtime parametersmetadata(object): Additional model metadata
Example Response
Retrieve Model
Retrieves detailed information about a specific model.Path Parameters
The ID of the model to retrieve.
Response
Returns a model object with the following fields:The model identifier that can be referenced in API endpoints.
Always
"model".Unix timestamp (in seconds) of when the model was created.
The organization or author that created the model.
Human-readable name used in the UI.
The version of the model.
The format of the model file (e.g.,
"gguf", "safetensors").The inference engine used to run this model (e.g.,
"llama.cpp", "onnxruntime").A description of the model and its capabilities.
Download sources for the model.
filename(string): The filename of the model artifacturl(string): URL where the model can be downloaded
Model configuration settings.Common settings:
ctx_len(number): Context length/window sizengl(number): Number of GPU layers to offloadembedding(boolean): Whether this is an embedding modelprompt_template(string): Template for formatting promptssystem_prompt(string): Default system promptcpu_threads(number): Number of CPU threads to usen_parallel(number): Number of parallel sequencestemperature(number): Sampling temperaturetop_p(number): Nucleus sampling thresholdtop_k(number): Top-k sampling parametermin_p(number): Minimum probability thresholdrepeat_penalty(number): Repetition penaltypresence_penalty(number): Presence penaltyfrequency_penalty(number): Frequency penalty
Default runtime parameters for inference.
temperature(number): Default sampling temperaturetop_p(number): Default nucleus sampling parametertop_k(number): Default top-k parametermax_tokens(number): Default maximum tokens to generatestream(boolean): Whether streaming is enabled by defaultstop(array): Default stop sequencesfrequency_penalty(number): Default frequency penaltypresence_penalty(number): Default presence penalty
Additional metadata about the model.
author(string): Model author or organizationtags(array): Tags describing the modelsize(number): Model file size in bytescover(string): URL to model cover image
Example Response
Model Types
Jan supports different types of models:Chat Models
Models optimized for conversational interactions. These models have:embedding: false- Prompt templates for chat formatting
- Support for multi-turn conversations
llama3-8b-instruct, qwen2.5-7b-instruct, mistral-7b-instruct
Embedding Models
Models that generate vector embeddings for text. These models have:embedding: true- Different API endpoint (
/v1/embeddings) - Output vector representations instead of text
nomic-embed-text, sentence-transformers
Vision Models
Models that can process both text and images. These models have:vision_model: truemmprojsetting for vision projection- Support multimodal input in chat completions
llava-v1.6-7b, bakllava
Model Settings
Key model settings you can configure:Context Length (ctx_len)
The maximum number of tokens the model can process in its context window. Larger values allow longer conversations but require more memory.
GPU Layers (ngl)
Number of model layers to offload to GPU. Higher values improve performance but require more VRAM.
CPU Threads (cpu_threads)
Number of CPU threads to use for inference. More threads can improve performance on CPU.
Prompt Template
Defines how messages are formatted before being sent to the model. Different models require different formatting.Error Responses
Model Not Found
404 Not Found
Unauthorized
401 Unauthorized