OpenAI Chat Completions API

Spice provides an OpenAI-compatible Chat Completions API at /v1/chat/completions, allowing you to use the OpenAI SDK and libraries to interact with your configured language models.

Endpoint

POST /v1/chat/completions

Authentication

Include your Spice API key in the request headers:

Authorization: Bearer <your-api-key>

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	The name of the language model to use (e.g., `gpt-4o`, `gpt-4o-mini`)
`messages`	array	Yes	Array of message objects with `role` and `content`
`stream`	boolean	No	Whether to stream the response (default: `false`)
`temperature`	number	No	Sampling temperature between 0 and 2
`max_tokens`	integer	No	Maximum number of tokens to generate
`top_p`	number	No	Nucleus sampling parameter
`frequency_penalty`	number	No	Penalty for token frequency
`presence_penalty`	number	No	Penalty for token presence

Message Roles

system - System instructions for the model
user - User messages
assistant - Assistant responses
developer - Developer-level instructions (supported by some models)

Response Format

Non-Streaming Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21,
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

Streaming Response

When stream: true, the response is sent as Server-Sent Events (SSE):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: [DONE]

Advanced: Completion Progress Tracking

Spice supports tracking completion progress through an optional header:

x-spiceai-completion-progress: enabled

When enabled with streaming, this includes intermediate progress events in the SSE stream alongside the completion chunks.

Examples

cURL

curl -X POST http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms."
      }
    ],
    "stream": false,
    "temperature": 0.7,
    "max_tokens": 500
  }'

OpenAI Python SDK

from openai import OpenAI

# Point the OpenAI client to your Spice instance
client = OpenAI(
    base_url="http://localhost:8090/v1",
    api_key="<your-api-key>"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7,
    max_tokens=100
)

print(response.choices[0].message.content)

Streaming Example (Python)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8090/v1",
    api_key="<your-api-key>"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Write a haiku about data."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

OpenAI Node.js SDK

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8090/v1',
  apiKey: '<your-api-key>'
});

const completion = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Tell me a joke about databases.' }
  ],
  temperature: 0.8,
  max_tokens: 150
});

console.log(completion.choices[0].message.content);

Streaming Example (Node.js)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8090/v1',
  apiKey: '<your-api-key>'
});

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Count from 1 to 5.' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Error Responses

Model Not Found (404)

{
  "error": "model 'gpt-5' not found"
}

API Error (4xx/5xx)

{
  "message": "Invalid API key provided",
  "type": "invalid_request_error",
  "param": null,
  "code": "invalid_api_key"
}

Status codes follow OpenAI conventions:

400 - Invalid request parameters
401 - Invalid API key
402 - Insufficient quota
404 - Model not found
429 - Rate limit exceeded
500 - Internal server error

Supported Models

The available models depend on your Spice configuration. Common models include:

OpenAI models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropic models: claude-3-5-sonnet, claude-3-opus, claude-3-haiku
Open-source models: Configure custom models in your Spicepod

List available models using the Models API.

Query APIs

AI APIs

HTTP APIs

Integration

OpenAI Chat Completions API

Endpoint

Authentication

Request Parameters

Message Roles

Response Format

Non-Streaming Response

Streaming Response

Advanced: Completion Progress Tracking

Examples

cURL

OpenAI Python SDK

Streaming Example (Python)

OpenAI Node.js SDK

Streaming Example (Node.js)

Error Responses

Model Not Found (404)

API Error (4xx/5xx)

Supported Models

Build docs developers (and LLMs) love

Query APIs

AI APIs

HTTP APIs

Integration

​Endpoint

​Authentication

​Request Parameters

​Message Roles

​Response Format

​Non-Streaming Response

​Streaming Response

​Advanced: Completion Progress Tracking

​Examples

​cURL

​OpenAI Python SDK

​Streaming Example (Python)

​OpenAI Node.js SDK

​Streaming Example (Node.js)

​Error Responses

​Model Not Found (404)

​API Error (4xx/5xx)

​Supported Models

Build docs developers (and LLMs) love

Endpoint

Authentication

Request Parameters

Message Roles

Response Format

Non-Streaming Response

Streaming Response

Advanced: Completion Progress Tracking

Examples

cURL

OpenAI Python SDK

Streaming Example (Python)

OpenAI Node.js SDK

Streaming Example (Node.js)

Error Responses

Model Not Found (404)

API Error (4xx/5xx)

Supported Models