Models

List Models

Lists all models currently available in your Jan instance.

curl http://127.0.0.1:1337/v1/models \
  -H "Authorization: Bearer secret-key-123"

Response

object

string

Always list.

data

array

Array of model objects available in Jan.Each model contains:

id (string): The model identifier that can be referenced in API endpoints
object (string): Always "model"
created (number): Unix timestamp of when the model was created
owned_by (string): The organization or author of the model
name (string): Human-readable name of the model
version (string): Model version
format (string): Model format (e.g., "gguf")
engine (string): Inference engine (e.g., "llama.cpp")
description (string): Model description
settings (object): Model configuration settings
parameters (object): Runtime parameters
metadata (object): Additional model metadata

Example Response

{
  "object": "list",
  "data": [
    {
      "id": "llama3-8b-instruct",
      "object": "model",
      "created": 1699896916,
      "owned_by": "Meta",
      "name": "Llama 3 8B Instruct",
      "version": "1.0",
      "format": "gguf",
      "engine": "llama.cpp",
      "description": "Meta's Llama 3 8B instruction-tuned model",
      "settings": {
        "ctx_len": 8192,
        "ngl": 33,
        "embedding": false,
        "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
      },
      "parameters": {
        "temperature": 0.7,
        "top_p": 0.95,
        "max_tokens": 4096,
        "stream": true
      },
      "metadata": {
        "author": "Meta",
        "tags": ["instruct", "chat", "8b"],
        "size": 4661211136
      }
    },
    {
      "id": "qwen2.5-7b-instruct",
      "object": "model",
      "created": 1699896917,
      "owned_by": "Alibaba",
      "name": "Qwen 2.5 7B Instruct",
      "version": "1.0",
      "format": "gguf",
      "engine": "llama.cpp",
      "description": "Alibaba's Qwen 2.5 7B instruction-tuned model",
      "settings": {
        "ctx_len": 32768,
        "ngl": 33,
        "embedding": false
      },
      "parameters": {
        "temperature": 0.7,
        "top_p": 0.8,
        "max_tokens": 2048,
        "stream": true
      },
      "metadata": {
        "author": "Alibaba",
        "tags": ["instruct", "chat", "7b"],
        "size": 4370000000
      }
    }
  ]
}

Retrieve Model

Retrieves detailed information about a specific model.

curl http://127.0.0.1:1337/v1/models/llama3-8b-instruct \
  -H "Authorization: Bearer secret-key-123"

Path Parameters

model_id

string

required

The ID of the model to retrieve.

Response

Returns a model object with the following fields:

string

The model identifier that can be referenced in API endpoints.

object

string

Always "model".

created

number

Unix timestamp (in seconds) of when the model was created.

owned_by

string

The organization or author that created the model.

name

string

Human-readable name used in the UI.

version

string

The version of the model.

format

string

The format of the model file (e.g., "gguf", "safetensors").

engine

string

The inference engine used to run this model (e.g., "llama.cpp", "onnxruntime").

description

string

A description of the model and its capabilities.

sources

array

Download sources for the model.

filename (string): The filename of the model artifact
url (string): URL where the model can be downloaded

settings

object

Model configuration settings.Common settings:

ctx_len (number): Context length/window size
ngl (number): Number of GPU layers to offload
embedding (boolean): Whether this is an embedding model
prompt_template (string): Template for formatting prompts
system_prompt (string): Default system prompt
cpu_threads (number): Number of CPU threads to use
n_parallel (number): Number of parallel sequences
temperature (number): Sampling temperature
top_p (number): Nucleus sampling threshold
top_k (number): Top-k sampling parameter
min_p (number): Minimum probability threshold
repeat_penalty (number): Repetition penalty
presence_penalty (number): Presence penalty
frequency_penalty (number): Frequency penalty

parameters

object

Default runtime parameters for inference.

temperature (number): Default sampling temperature
top_p (number): Default nucleus sampling parameter
top_k (number): Default top-k parameter
max_tokens (number): Default maximum tokens to generate
stream (boolean): Whether streaming is enabled by default
stop (array): Default stop sequences
frequency_penalty (number): Default frequency penalty
presence_penalty (number): Default presence penalty

metadata

object

Additional metadata about the model.

author (string): Model author or organization
tags (array): Tags describing the model
size (number): Model file size in bytes
cover (string): URL to model cover image

Example Response

{
  "id": "llama3-8b-instruct",
  "object": "model",
  "created": 1699896916,
  "owned_by": "Meta",
  "name": "Llama 3 8B Instruct",
  "version": "1.0",
  "format": "gguf",
  "engine": "llama.cpp",
  "description": "Meta's Llama 3 8B instruction-tuned model optimized for chat and dialogue use cases.",
  "sources": [
    {
      "filename": "llama-3-8b-instruct-q4_k_m.gguf",
      "url": "https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF"
    }
  ],
  "settings": {
    "ctx_len": 8192,
    "ngl": 33,
    "embedding": false,
    "cpu_threads": 4,
    "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
    "system_prompt": "You are a helpful, respectful and honest assistant."
  },
  "parameters": {
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 40,
    "max_tokens": 4096,
    "stream": true,
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "stop": ["<|eot_id|>"]
  },
  "metadata": {
    "author": "Meta",
    "tags": ["instruct", "chat", "8b", "llama3"],
    "size": 4661211136
  }
}

Model Types

Jan supports different types of models:

Chat Models

Models optimized for conversational interactions. These models have:

embedding: false
Prompt templates for chat formatting
Support for multi-turn conversations

Examples: llama3-8b-instruct, qwen2.5-7b-instruct, mistral-7b-instruct

Embedding Models

Models that generate vector embeddings for text. These models have:

embedding: true
Different API endpoint (/v1/embeddings)
Output vector representations instead of text

Examples: nomic-embed-text, sentence-transformers

Vision Models

Models that can process both text and images. These models have:

vision_model: true
mmproj setting for vision projection
Support multimodal input in chat completions

Examples: llava-v1.6-7b, bakllava

Model Settings

Key model settings you can configure:

Context Length (`ctx_len`)

The maximum number of tokens the model can process in its context window. Larger values allow longer conversations but require more memory.

GPU Layers (`ngl`)

Number of model layers to offload to GPU. Higher values improve performance but require more VRAM.

CPU Threads (`cpu_threads`)

Number of CPU threads to use for inference. More threads can improve performance on CPU.

Prompt Template

Defines how messages are formatted before being sent to the model. Different models require different formatting.

Error Responses

Model Not Found

{
  "error": {
    "message": "Model 'invalid-model-id' not found",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}

Status: 404 Not Found

Unauthorized

{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

Status: 401 Unauthorized

CLI

Extensions

API Reference

Core Library

List Models

Response

Example Response

Retrieve Model

Path Parameters

Response

Example Response

Model Types

Chat Models

Embedding Models

Vision Models

Model Settings

Context Length (`ctx_len`)

GPU Layers (`ngl`)

CPU Threads (`cpu_threads`)

Prompt Template

Error Responses

Model Not Found

Unauthorized

Build docs developers (and LLMs) love

CLI

Extensions

API Reference

Core Library

​List Models

​Response

​Example Response

​Retrieve Model

​Path Parameters

​Response

​Example Response

​Model Types

​Chat Models

​Embedding Models

​Vision Models

​Model Settings

​Context Length (ctx_len)

​GPU Layers (ngl)

​CPU Threads (cpu_threads)

​Prompt Template

​Error Responses

​Model Not Found

​Unauthorized

Build docs developers (and LLMs) love

List Models

Response

Example Response

Retrieve Model

Path Parameters

Response

Example Response

Model Types

Chat Models

Embedding Models

Vision Models

Model Settings

Context Length (`ctx_len`)

GPU Layers (`ngl`)

CPU Threads (`cpu_threads`)

Prompt Template

Error Responses

Model Not Found

Unauthorized