Skip to main content

POST /predictions

Makes a single prediction.

Request

curl http://localhost:5001/predictions -X POST \
    --header "Content-Type: application/json" \
    --data '{
      "input": {
        "image": "https://example.com/image.jpg",
        "text": "Hello world!"
      }
    }'
input
object
required
A JSON object with the same keys as the arguments to the predict() function. Any File or Path inputs are passed as URLs.
webhook
string
URL to receive webhook notifications about prediction status updates. See Webhooks for details.
webhook_events_filter
array
Array of event types to trigger webhook notifications. Valid values: start, output, logs, completed. Defaults to all events if not specified.
output_file_prefix
string
Base URL to upload output files to instead of returning them as base64-encoded data URLs. See File Uploads for details.

Response (200 OK)

Synchronous prediction completed successfully.
{
    "status": "succeeded",
    "output": "data:image/png;base64,...",
    "metrics": {
        "predict_time": 4.52
    }
}
status
string
required
Either succeeded or failed.
output
any
The return value of the predict() function. Type depends on your model’s output schema.
error
string
If status is failed, contains the error message.
metrics
object
An object containing prediction metrics.

Response (202 Accepted)

Asynchronous prediction started (when Prefer: respond-async header is set).
curl http://localhost:5001/predictions -X POST \
    --header "Content-Type: application/json" \
    --header "Prefer: respond-async" \
    --data '{
      "input": {"prompt": "A picture of an onion with sunglasses"}
    }'
{
    "status": "starting"
}
status
string
required
Prediction status. Value will be starting when the prediction is first created.

PUT /predictions/<prediction_id>

Makes a single prediction. This is the idempotent version of the POST /predictions endpoint. If a client calls this endpoint more than once with the same ID (for example, due to a network interruption) while the prediction is still running, no new prediction is created. Instead, the client receives a 202 Accepted response with the initial state of the prediction.

Generating Prediction IDs

Clients are responsible for providing unique prediction IDs. We recommend generating a UUIDv4 or UUIDv7, base32-encoding that value, and removing padding characters (==). This produces a random identifier that is 26 ASCII characters long.
from uuid import uuid4
from base64 import b32encode

id = b32encode(uuid4().bytes).decode('utf-8').lower().rstrip('=')
print(id)  # 'wjx3whax6rf4vphkegkhcvpv6a'
The server can run only one prediction at a time. The client must ensure that the running prediction is complete before creating a new one with a different ID.

Request

curl http://localhost:5001/predictions/wjx3whax6rf4vphkegkhcvpv6a -X PUT \
    --header "Content-Type: application/json" \
    --data '{
      "input": {"prompt": "A picture of an onion with sunglasses"}
    }'
prediction_id
string
required
Unique identifier for the prediction. Must be provided by the client.
input
object
required
A JSON object with the same keys as the arguments to the predict() function. Any File or Path inputs are passed as URLs.
webhook
string
URL to receive webhook notifications about prediction status updates. See Webhooks for details.
webhook_events_filter
array
Array of event types to trigger webhook notifications. Valid values: start, output, logs, completed. Defaults to all events if not specified.
output_file_prefix
string
Base URL to upload output files to instead of returning them as base64-encoded data URLs. See File Uploads for details.

Response (200 OK)

Synchronous prediction completed successfully.
{
    "status": "succeeded",
    "output": "data:image/png;base64,..."
}
status
string
required
Either succeeded or failed.
output
any
The return value of the predict() function. Type depends on your model’s output schema.
error
string
If status is failed, contains the error message.

Response (202 Accepted)

Asynchronous prediction started (when Prefer: respond-async header is set).
curl http://localhost:5001/predictions/wjx3whax6rf4vphkegkhcvpv6a -X PUT \
    --header "Content-Type: application/json" \
    --header "Prefer: respond-async" \
    --data '{
      "input": {"prompt": "A picture of an onion with sunglasses"}
    }'
{
    "id": "wjx3whax6rf4vphkegkhcvpv6a",
    "status": "starting"
}
id
string
required
The prediction ID provided in the request.
status
string
required
Prediction status. Value will be starting when the prediction is first created.

POST /predictions/<prediction_id>/cancel

Cancels an asynchronous prediction. A client can cancel an asynchronous prediction by making a POST /predictions/<prediction_id>/cancel request using the prediction id provided when the prediction was created.
A prediction cannot be canceled if it’s created synchronously (without the Prefer: respond-async header), or created without a provided id.

Request

# First, create an async prediction with an ID
curl http://localhost:5001/predictions -X POST \
    --header "Content-Type: application/json" \
    --header "Prefer: respond-async" \
    --data '{
      "id": "abcd1234",
      "input": {"prompt": "A picture of an onion with sunglasses"}
    }'

# Then cancel it
curl http://localhost:5001/predictions/abcd1234/cancel -X POST
prediction_id
string
required
The ID of the prediction to cancel.

Response

200 OK - If a prediction exists with the provided id. 404 Not Found - If no prediction exists with the provided id.

Handling Cancellation in Your Model

When a prediction is canceled, Cog raises CancelationException in sync predictors (or asyncio.CancelledError in async predictors). This exception may be caught by the model to perform necessary cleanup. The cleanup should be brief, ideally completing within a few seconds. After cleanup, the exception must be re-raised using a bare raise statement. Failure to re-raise the exception may result in the termination of the container.
from cog import CancelationException, Path

def predict(image: Path) -> Path:
    try:
        return process(image)
    except CancelationException:
        cleanup()
        raise  # always re-raise

Build docs developers (and LLMs) love