Skip to main content
Spice provides a REST API for running inference on configured machine learning models. Use this API to make predictions with your trained models.

Endpoints

Single Model Prediction

GET /v1/models/{name}/predict
Make a prediction using a specific model.

Batch Predictions

POST /v1/predict
Perform predictions using multiple models in a single request. Useful for ensembling or A/B testing.

Authentication

Include your Spice API key in the request headers:
Authorization: Bearer <your-api-key>

Single Model Prediction

Request

GET /v1/models/{model_name}/predict
Replace {model_name} with the name of your configured model.

Response Format

{
  "status": "Success",
  "model_name": "my_model",
  "model_version": "1.0",
  "prediction": [0.45, 0.50, 0.55],
  "duration_ms": 123
}

Response Fields

FieldTypeDescription
statusstringPrediction status: Success, BadRequest, or InternalError
model_namestringName of the model used
model_versionstringVersion of the model
predictionarrayPrediction results as an array of floats
duration_msintegerTime taken to complete the prediction in milliseconds
error_messagestringError description (only present on failure)

Batch Predictions

Request Body

{
  "predictions": [
    { "model_name": "drive_stats_a" },
    { "model_name": "drive_stats_b" }
  ]
}

Response Format

{
  "duration_ms": 81,
  "predictions": [
    {
      "status": "Success",
      "model_name": "drive_stats_a",
      "model_version": "1.0",
      "prediction": [0.45, 0.5, 0.55],
      "duration_ms": 42
    },
    {
      "status": "Success",
      "model_name": "drive_stats_b",
      "model_version": "1.0",
      "prediction": [0.43, 0.51, 0.53],
      "duration_ms": 42
    }
  ]
}

Response Fields

FieldTypeDescription
duration_msintegerTotal time for all predictions in milliseconds
predictionsarrayArray of individual prediction results
Each prediction object has the same fields as the single model response.

Examples

Single Model Prediction

cURL

curl -X GET http://localhost:8090/v1/models/price_predictor/predict \
  -H "Authorization: Bearer <your-api-key>"

Python

import requests

url = "http://localhost:8090/v1/models/price_predictor/predict"
headers = {
    "Authorization": "Bearer <your-api-key>"
}

response = requests.get(url, headers=headers)
result = response.json()

print(f"Prediction: {result['prediction']}")
print(f"Duration: {result['duration_ms']}ms")

Node.js

const axios = require('axios');

const url = 'http://localhost:8090/v1/models/price_predictor/predict';
const headers = {
  'Authorization': 'Bearer <your-api-key>'
};

axios.get(url, { headers })
  .then(response => {
    console.log('Prediction:', response.data.prediction);
    console.log('Duration:', response.data.duration_ms, 'ms');
  })
  .catch(error => {
    console.error('Error:', error.response.data);
  });

Batch Predictions

cURL

curl -X POST http://localhost:8090/v1/predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "predictions": [
      { "model_name": "model_v1" },
      { "model_name": "model_v2" },
      { "model_name": "model_ensemble" }
    ]
  }'

Python

import requests

url = "http://localhost:8090/v1/predict"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer <your-api-key>"
}
payload = {
    "predictions": [
        {"model_name": "model_v1"},
        {"model_name": "model_v2"}
    ]
}

response = requests.post(url, json=payload, headers=headers)
result = response.json()

print(f"Total duration: {result['duration_ms']}ms")
for pred in result['predictions']:
    print(f"{pred['model_name']}: {pred['prediction']} ({pred['duration_ms']}ms)")

Node.js

const axios = require('axios');

const url = 'http://localhost:8090/v1/predict';
const headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer <your-api-key>'
};
const data = {
  predictions: [
    { model_name: 'model_v1' },
    { model_name: 'model_v2' }
  ]
};

axios.post(url, data, { headers })
  .then(response => {
    console.log('Total duration:', response.data.duration_ms, 'ms');
    response.data.predictions.forEach(pred => {
      console.log(`${pred.model_name}: ${pred.prediction} (${pred.duration_ms}ms)`);
    });
  })
  .catch(error => {
    console.error('Error:', error.response.data);
  });

Use Cases

A/B Testing

Compare predictions from different model versions:
import requests

url = "http://localhost:8090/v1/predict"
payload = {
    "predictions": [
        {"model_name": "model_v1"},
        {"model_name": "model_v2_experimental"}
    ]
}

response = requests.post(url, json=payload, headers=headers)
results = response.json()

# Compare predictions
v1_pred = results['predictions'][0]['prediction']
v2_pred = results['predictions'][1]['prediction']

print(f"V1 prediction: {v1_pred}")
print(f"V2 prediction: {v2_pred}")
print(f"Difference: {abs(v1_pred[0] - v2_pred[0])}")

Ensemble Predictions

Combine predictions from multiple models:
import numpy as np
import requests

url = "http://localhost:8090/v1/predict"
payload = {
    "predictions": [
        {"model_name": "model_1"},
        {"model_name": "model_2"},
        {"model_name": "model_3"}
    ]
}

response = requests.post(url, json=payload, headers=headers)
results = response.json()

# Extract predictions and calculate ensemble
predictions = [pred['prediction'] for pred in results['predictions'] 
               if pred['status'] == 'Success']
ensemble_prediction = np.mean(predictions, axis=0)

print(f"Ensemble prediction: {ensemble_prediction}")

Prediction Status

Success

Prediction completed successfully:
{
  "status": "Success",
  "model_name": "my_model",
  "model_version": "1.0",
  "prediction": [0.45, 0.50, 0.55],
  "duration_ms": 123
}

Bad Request (400)

Invalid request or model not found:
{
  "status": "BadRequest",
  "error_message": "Model 'unknown_model' not found",
  "model_name": "unknown_model",
  "duration_ms": 12
}

Internal Error (500)

Server error during prediction:
{
  "status": "InternalError",
  "error_message": "Unable to find column 'y' in inference result",
  "model_name": "my_model",
  "model_version": "1.0",
  "duration_ms": 12
}

Model Configuration

Models must be configured in your Spicepod before they can be used for inference. Example configuration:
models:
  - name: price_predictor
    from: file:///models/price_model.onnx
    datasets:
      - input_data
See the Models documentation for configuration details.

Performance Considerations

  1. Batch predictions: Use batch endpoint for multiple models to reduce network overhead
  2. Model loading: Models are loaded on startup; first predictions may be slower
  3. Concurrency: Batch predictions run concurrently for better performance
  4. Data format: Predictions expect Float32Array results (column ‘y’)

Error Handling

Always check the status field in responses:
response = requests.get(f"{url}/v1/models/{model_name}/predict", headers=headers)
result = response.json()

if result['status'] != 'Success':
    print(f"Prediction failed: {result.get('error_message', 'Unknown error')}")
else:
    print(f"Prediction: {result['prediction']}")
For batch predictions, check individual prediction statuses:
response = requests.post(f"{url}/v1/predict", json=payload, headers=headers)
results = response.json()

for pred in results['predictions']:
    if pred['status'] == 'Success':
        print(f"{pred['model_name']}: {pred['prediction']}")
    else:
        print(f"{pred['model_name']} failed: {pred.get('error_message')}")

Build docs developers (and LLMs) love