Skip to main content
POST
/
v1
/
audio
/
transcriptions
Create Transcription
curl --request POST \
  --url https://api.example.com/v1/audio/transcriptions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123
}
'
{
  "text": "<string>",
  "language": "<string>",
  "duration": 123,
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "segments": [
    {
      "id": 123,
      "text": "<string>",
      "start": 123,
      "end": 123,
      "tokens": [
        {}
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ],
  "usage": {
    "type": "<string>",
    "seconds": 123,
    "input_tokens": 123,
    "output_tokens": 123,
    "total_tokens": 123
  },
  "logprobs": [
    {}
  ]
}
Transcribes audio files using OpenAI’s Whisper model. Supports multiple audio formats including mp3, mp4, mpeg, mpga, m4a, wav, and webm. Maximum file size is 25 MB.

Method

client.audio.transcriptions.create(params)

Request Parameters

file
File
required
The audio file object (not file name) to transcribe, in one of these formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.Maximum file size: 25 MB
model
string
required
Model ID to use for transcription (e.g., openai/whisper-1).
language
string
The language of the input audio in ISO-639-1 format (e.g., en, es, fr). Supplying the input language in will improve accuracy and latency.
prompt
string
An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.
response_format
string
The format of the transcript output. Options:
  • json - JSON response with text only
  • text - Plain text
  • srt - SubRip subtitle format
  • verbose_json - JSON with timestamps, segments, and metadata
  • vtt - WebVTT subtitle format
Default: json
temperature
number
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.Default: 0

Response

text
string
required
The transcribed text.
language
string
The language of the input audio (verbose_json only).
duration
number
The duration of the input audio in seconds (verbose_json only).
words
array
Extracted words and their corresponding timestamps (verbose_json with timestamp granularities).
segments
array
Segments of the transcribed text and their corresponding details (verbose_json only).
usage
object
Usage statistics for the request.
logprobs
array
The log probabilities of the tokens in the transcription. Only returned with models like gpt-4o-transcribe and gpt-4o-mini-transcribe if logprobs is added to the include array.

Examples

import fs from "fs";

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream("audio.mp3"),
  model: "openai/whisper-1",
});

console.log(transcription.text);

Build docs developers (and LLMs) love