Create Chat Completion
Creates a model response for the given chat conversation.Request Body
The ID of the model to use. Must match a model available in Jan.Example:
llama3-8b-instruct, qwen2.5-7b-instructA list of messages comprising the conversation so far.Each message has:
role(string, required): One ofsystem,user,assistant, ortoolcontent(string or array): The message content. Can be a string or an array of content parts (for multimodal messages)name(string, optional): The name of the message authortool_calls(array, optional): Tool calls made by the assistanttool_call_id(string, optional): The ID of the tool call this message is responding to
Sampling temperature between 0 and 2. Higher values make output more random, lower values more deterministic.
The maximum number of tokens to generate. Set to
null or omit for unlimited generation (up to context limit).Nucleus sampling: only tokens with cumulative probability up to
top_p are considered.Only the top K most likely tokens are considered for generation.
Minimum probability threshold for token selection.
If
true, returns a stream of Server-Sent Events (SSE) as the model generates tokens.Up to 4 sequences where the API will stop generating further tokens.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text.
Penalty for repeating tokens. Values > 1 discourage repetition.
Number of previous tokens to consider for repeat penalty.
Random seed for reproducible generation.
A list of tools the model may call. Each tool has:
type(string): Currently only"function"is supportedfunction(object): Function definition withname,description, andparameters
Controls which (if any) function is called by the model.
"none": Model will not call any function"auto": Model can pick between generating a message or calling a function"required": Model must call one or more functions{"type": "function", "function": {"name": "my_function"}}: Forces a specific function call
Advanced Parameters
Dynamic temperature range for sampling.
Dynamic temperature exponent.
Typical probability mass for sampling.
Enable Mirostat sampling.
0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0.Mirostat target entropy.
Mirostat learning rate.
Modify the likelihood of specified tokens appearing. Maps token IDs to bias values (-100 to 100).
Enable KV cache for the prompt.
Response
A unique identifier for the chat completion.
The object type, always
chat.completion.Unix timestamp (in seconds) of when the completion was created.
The model used for the completion.
A list of chat completion choices. Can be more than one if
n is greater than 1.Each choice contains:index(number): The index of this choicemessage(object): The generated messagerole(string): Alwaysassistantcontent(string): The content of the messagetool_calls(array, optional): Tool calls made by the model
finish_reason(string): Why generation stopped (stop,length,tool_calls,content_filter)
Token usage information.
prompt_tokens(number): Number of tokens in the promptcompletion_tokens(number): Number of tokens in the completiontotal_tokens(number): Total tokens used
System fingerprint for the backend.
Example Response
Streaming
Whenstream is set to true, the API returns Server-Sent Events (SSE) as the model generates tokens.
Streaming Request
cURL
Streaming Response
Each chunk is a JSON object prefixed withdata: :
Streaming Response Fields
Unique identifier for the chat completion (consistent across all chunks).
Always
chat.completion.chunk.Unix timestamp.
The model used.
Array of choices.
index(number): Choice indexdelta(object): Content deltarole(string, optional): Set in first chunkcontent(string, optional): Incremental content
finish_reason(string | null): Reason for stopping (only in final chunk)
Jan-specific field showing prompt processing progress.
cache(number): Tokens already in KV cacheprocessed(number): Tokens processed so fartotal(number): Total prompt tokenstime_ms(number): Time spent processing
Multimodal Messages
Jan supports vision models that can process images alongside text.Image Input
cURL
Content Array Format
When using multimodal messages, thecontent field is an array of objects:
The type of content:
text, image_url, or input_audio.Text content (when type is
text).Image content (when type is
image_url).url(string): URL or base64-encoded data URI
Function Calling
Jan supports function calling for compatible models.Request with Tools
Response with Tool Call
Error Handling
Finish Reasons
stop: Natural stop point or stop sequence reachedlength: Maximum token limit reached (context overflow)tool_calls: Model called a functioncontent_filter: Content was filtered
Context Overflow
When the conversation exceeds the model’s context window, the API returnsfinish_reason: "length". You’ll need to truncate the conversation history or use a model with a larger context window.