Skip to main content

Create translation

Translates audio into English.
from openai import OpenAI

client = OpenAI()

audio_file = open("german_audio.mp3", "rb")
translation = client.audio.translations.create(
    model="whisper-1",
    file=audio_file
)

print(translation.text)

Parameters

file
file
required
The audio file object (not file name) to translate, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
model
string
required
ID of the model to use. Only whisper-1 (which is powered by OpenAI’s open source Whisper V2 model) is currently available.
prompt
string
An optional text to guide the model’s style or continue a previous audio segment. The prompt should be in English.
response_format
string
default:"json"
The format of the output, in one of these options: json, text, srt, verbose_json, or vtt.
temperature
float
default:"0"
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

Response

text
string
The translated text in English.

Examples

Basic translation

from openai import OpenAI

client = OpenAI()

# Translate Spanish audio to English
audio_file = open("spanish_audio.mp3", "rb")
translation = client.audio.translations.create(
    model="whisper-1",
    file=audio_file
)

print(translation.text)

Get translation as SRT subtitles

from openai import OpenAI

client = OpenAI()

audio_file = open("french_video.mp3", "rb")
translation = client.audio.translations.create(
    model="whisper-1",
    file=audio_file,
    response_format="srt"
)

# Save English SRT file
with open("english_subtitles.srt", "w") as f:
    f.write(translation)

Get translation as VTT

from openai import OpenAI

client = OpenAI()

audio_file = open("mandarin_audio.mp3", "rb")
translation = client.audio.translations.create(
    model="whisper-1",
    file=audio_file,
    response_format="vtt"
)

# Save as WebVTT file
with open("subtitles.vtt", "w") as f:
    f.write(translation)

Translation with verbose JSON

from openai import OpenAI

client = OpenAI()

audio_file = open("japanese_audio.mp3", "rb")
translation = client.audio.translations.create(
    model="whisper-1",
    file=audio_file,
    response_format="verbose_json"
)

# Access detailed information
print(f"Language: {translation.language}")
print(f"Duration: {translation.duration}")
print(f"Text: {translation.text}")

Translation with prompt

from openai import OpenAI

client = OpenAI()

audio_file = open("german_presentation.mp3", "rb")
translation = client.audio.translations.create(
    model="whisper-1",
    file=audio_file,
    prompt="This is a technical presentation about machine learning algorithms."
)

print(translation.text)

Batch translate multiple files

from openai import OpenAI
from pathlib import Path

client = OpenAI()

audio_files = Path("audio_files").glob("*.mp3")

for audio_path in audio_files:
    with audio_path.open("rb") as audio_file:
        translation = client.audio.translations.create(
            model="whisper-1",
            file=audio_file
        )
        
        # Save translation
        output_path = audio_path.with_suffix(".txt")
        output_path.write_text(translation.text)
        print(f"Translated {audio_path.name}")

Async usage

import asyncio
from openai import AsyncOpenAI

async def translate_audio():
    client = AsyncOpenAI()
    
    audio_file = open("italian_audio.mp3", "rb")
    translation = await client.audio.translations.create(
        model="whisper-1",
        file=audio_file
    )
    
    print(translation.text)

asyncio.run(translate_audio())

Supported audio formats

The translation endpoint supports the following audio formats:
  • flac - Free Lossless Audio Codec
  • mp3 - MPEG audio format
  • mp4 - MPEG-4 Part 14
  • mpeg - MPEG audio
  • mpga - MPEG audio
  • m4a - MPEG-4 audio
  • ogg - Ogg Vorbis
  • wav - Waveform audio
  • webm - WebM audio

File uploads

Files are uploaded using multipart/form-data. The file object should be opened in binary mode:
# Correct way to open file
audio_file = open("path/to/file.mp3", "rb")

# Or using pathlib
from pathlib import Path
audio_file = Path("path/to/file.mp3").open("rb")

Translation vs Transcription

The key difference between the /audio/translations and /audio/transcriptions endpoints:
  • Translations - Always outputs English text, regardless of input language
  • Transcriptions - Outputs text in the same language as the audio input

When to use translations

Use the translations endpoint when you need to:
  • Convert non-English audio into English text
  • Create English subtitles for foreign language videos
  • Translate podcasts or audio content into English
  • Build multilingual applications that standardize on English output

When to use transcriptions

Use the transcriptions endpoint when you need to:
  • Convert audio to text in the same language
  • Create subtitles in the original language
  • Preserve the original language of the content

Example: Multi-language processing

from openai import OpenAI

client = OpenAI()

# First transcribe in original language
audio_file = open("spanish_audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    language="es"
)
print(f"Spanish: {transcription.text}")

# Then translate to English
audio_file = open("spanish_audio.mp3", "rb")
translation = client.audio.translations.create(
    model="whisper-1",
    file=audio_file
)
print(f"English: {translation.text}")
For more information on audio processing and best practices, see the Speech to text guide.

Build docs developers (and LLMs) love