Skip to main content
Spice provides an OpenAI-compatible Chat Completions API at /v1/chat/completions, allowing you to use the OpenAI SDK and libraries to interact with your configured language models.

Endpoint

POST /v1/chat/completions

Authentication

Include your Spice API key in the request headers:
Authorization: Bearer <your-api-key>

Request Parameters

ParameterTypeRequiredDescription
modelstringYesThe name of the language model to use (e.g., gpt-4o, gpt-4o-mini)
messagesarrayYesArray of message objects with role and content
streambooleanNoWhether to stream the response (default: false)
temperaturenumberNoSampling temperature between 0 and 2
max_tokensintegerNoMaximum number of tokens to generate
top_pnumberNoNucleus sampling parameter
frequency_penaltynumberNoPenalty for token frequency
presence_penaltynumberNoPenalty for token presence

Message Roles

  • system - System instructions for the model
  • user - User messages
  • assistant - Assistant responses
  • developer - Developer-level instructions (supported by some models)

Response Format

Non-Streaming Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21,
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

Streaming Response

When stream: true, the response is sent as Server-Sent Events (SSE):
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: [DONE]

Advanced: Completion Progress Tracking

Spice supports tracking completion progress through an optional header:
x-spiceai-completion-progress: enabled
When enabled with streaming, this includes intermediate progress events in the SSE stream alongside the completion chunks.

Examples

cURL

curl -X POST http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms."
      }
    ],
    "stream": false,
    "temperature": 0.7,
    "max_tokens": 500
  }'

OpenAI Python SDK

from openai import OpenAI

# Point the OpenAI client to your Spice instance
client = OpenAI(
    base_url="http://localhost:8090/v1",
    api_key="<your-api-key>"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7,
    max_tokens=100
)

print(response.choices[0].message.content)

Streaming Example (Python)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8090/v1",
    api_key="<your-api-key>"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Write a haiku about data."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

OpenAI Node.js SDK

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8090/v1',
  apiKey: '<your-api-key>'
});

const completion = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Tell me a joke about databases.' }
  ],
  temperature: 0.8,
  max_tokens: 150
});

console.log(completion.choices[0].message.content);

Streaming Example (Node.js)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8090/v1',
  apiKey: '<your-api-key>'
});

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Count from 1 to 5.' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Error Responses

Model Not Found (404)

{
  "error": "model 'gpt-5' not found"
}

API Error (4xx/5xx)

{
  "message": "Invalid API key provided",
  "type": "invalid_request_error",
  "param": null,
  "code": "invalid_api_key"
}
Status codes follow OpenAI conventions:
  • 400 - Invalid request parameters
  • 401 - Invalid API key
  • 402 - Insufficient quota
  • 404 - Model not found
  • 429 - Rate limit exceeded
  • 500 - Internal server error

Supported Models

The available models depend on your Spice configuration. Common models include:
  • OpenAI models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
  • Anthropic models: claude-3-5-sonnet, claude-3-opus, claude-3-haiku
  • Open-source models: Configure custom models in your Spicepod
List available models using the Models API.

Build docs developers (and LLMs) love