Skip to main content
POST
/
v1
/
audio
/
translations
Create Translation
curl --request POST \
  --url https://api.example.com/v1/audio/translations \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123
}
'
{
  "text": "<string>",
  "language": "<string>",
  "duration": 123,
  "segments": [
    {
      "id": 123,
      "text": "<string>",
      "start": 123,
      "end": 123,
      "tokens": [
        {}
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ]
}
Translates audio files in any supported language to English text using OpenAI’s Whisper model. Supports the same audio formats as transcription. Maximum file size is 25 MB.

Method

client.audio.translations.create(params)

Request Parameters

file
File
required
The audio file object (not file name) to translate, in one of these formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.Maximum file size: 25 MB
model
string
required
Model ID to use for translation (e.g., openai/whisper-1).
prompt
string
An optional text to guide the model’s style or continue a previous audio segment. The prompt should be in English.
response_format
string
The format of the transcript output. Options:
  • json - JSON response with text only
  • text - Plain text
  • srt - SubRip subtitle format
  • verbose_json - JSON with timestamps, segments, and metadata
  • vtt - WebVTT subtitle format
Default: json
temperature
number
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.Default: 0

Response

text
string
required
The translated text in English.
language
string
The language of the output translation (always english) (verbose_json only).
duration
number
The duration of the input audio in seconds (verbose_json only).
segments
array
Segments of the translated text and their corresponding details (verbose_json only).

Examples

import fs from "fs";

const translation = await client.audio.translations.create({
  file: fs.createReadStream("german-audio.mp3"),
  model: "openai/whisper-1",
});

console.log(translation.text);

Build docs developers (and LLMs) love