Create Transcription

curl --request POST \
  --url https://api.example.com/v1/audio/transcriptions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123
}
'

{
  "text": "<string>",
  "language": "<string>",
  "duration": 123,
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "segments": [
    {
      "id": 123,
      "text": "<string>",
      "start": 123,
      "end": 123,
      "tokens": [
        {}
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "usage": {
    "type": "<string>",
    "seconds": 123,
    "input_tokens": 123,
    "output_tokens": 123,
    "total_tokens": 123
  },
  "logprobs": [
    {}
  ]
}

POST

audio

transcriptions

Create Transcription

curl --request POST \
  --url https://api.example.com/v1/audio/transcriptions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123
}
'

{
  "text": "<string>",
  "language": "<string>",
  "duration": 123,
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "segments": [
    {
      "id": 123,
      "text": "<string>",
      "start": 123,
      "end": 123,
      "tokens": [
        {}
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "usage": {
    "type": "<string>",
    "seconds": 123,
    "input_tokens": 123,
    "output_tokens": 123,
    "total_tokens": 123
  },
  "logprobs": [
    {}
  ]
}

Transcribes audio files using OpenAI’s Whisper model. Supports multiple audio formats including mp3, mp4, mpeg, mpga, m4a, wav, and webm. Maximum file size is 25 MB.

Method

client.audio.transcriptions.create(params)

Request Parameters

file

File

required

The audio file object (not file name) to transcribe, in one of these formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.Maximum file size: 25 MB

model

string

required

Model ID to use for transcription (e.g., openai/whisper-1).

language

string

The language of the input audio in ISO-639-1 format (e.g., en, es, fr). Supplying the input language in will improve accuracy and latency.

prompt

string

An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.

response_format

string

The format of the transcript output. Options:

json - JSON response with text only
text - Plain text
srt - SubRip subtitle format
verbose_json - JSON with timestamps, segments, and metadata
vtt - WebVTT subtitle format

Default: json

temperature

number

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.Default: 0

Response

text

string

required

The transcribed text.

language

string

The language of the input audio (verbose_json only).

duration

number

The duration of the input audio in seconds (verbose_json only).

words

array

Extracted words and their corresponding timestamps (verbose_json with timestamp granularities).

Show Word Object

word

string

The text content of the word.

start

number

Start time of the word in seconds.

end

number

End time of the word in seconds.

segments

array

Segments of the transcribed text and their corresponding details (verbose_json only).

Show Segment Object

number

Unique identifier of the segment.

text

string

Text content of the segment.

start

number

Start time of the segment in seconds.

end

number

End time of the segment in seconds.

tokens

array

Array of token IDs for the text content.

temperature

number

Temperature parameter used for generating the segment.

avg_logprob

number

Average logprob of the segment. If the value is lower than -1, consider the logprobs failed.

compression_ratio

number

Compression ratio of the segment. If the value is greater than 2.4, consider the compression failed.

no_speech_prob

number

Probability of no speech in the segment. If the value is higher than 1.0 and the avg_logprob is below -1, consider this segment silent.

usage

object

Usage statistics for the request.

Show Usage Object

type

string

Either duration or tokens depending on billing model.

seconds

number

Duration of the input audio in seconds (for duration-based billing).

input_tokens

number

Number of input tokens (for token-based billing).

output_tokens

number

Number of output tokens (for token-based billing).

total_tokens

number

Total number of tokens (for token-based billing).

logprobs

array

The log probabilities of the tokens in the transcription. Only returned with models like gpt-4o-transcribe and gpt-4o-mini-transcribe if logprobs is added to the include array.

Examples

import fs from "fs";

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream("audio.mp3"),
  model: "openai/whisper-1",
});

console.log(transcription.text);

Create Speech

Create Translation

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Overview

Chat

Audio

Images

Embeddings

Models

Types

Create Transcription

Method

Request Parameters

Response

Examples

Build docs developers (and LLMs) love

Overview

Chat

Audio

Images

Embeddings

Models

Types

​Method

​Request Parameters

​Response

​Examples

Build docs developers (and LLMs) love

Method

Request Parameters

Response

Examples