Skip to main content

Transcribe Audio

Transcribe audio files into the input language using the Whisper model.
require "openai"
require "pathname"

client = OpenAI::Client.new

response = client.audio.transcriptions.create(
  file: Pathname("audio.mp3"),
  model: "gpt-4o-transcribe"
)

puts response.text
The file parameter accepts a Pathname, StringIO, IO, or raw file contents.

Transcription with Options

Customize transcription with additional parameters like language, prompt, and response format.
response = client.audio.transcriptions.create(
  file: Pathname("meeting.mp3"),
  model: "gpt-4o-transcribe",
  language: "en",
  prompt: "This is a meeting about product development",
  response_format: "verbose_json",
  temperature: 0.2
)

# Verbose response includes timestamps and other metadata
puts "Text: #{response.text}"
puts "Language: #{response.language}"
puts "Duration: #{response.duration}"

Response Formats

The transcription API supports multiple response formats:
response = client.audio.transcriptions.create(
  file: Pathname("audio.mp3"),
  model: "gpt-4o-transcribe",
  response_format: "json"
)

# Returns basic transcription text
puts response.text

Translate Audio to English

Translate audio files from any supported language into English.
response = client.audio.translations.create(
  file: Pathname("french_audio.mp3"),
  model: "whisper-1"
)

puts response.text  # English translation

Translation with Prompt

Provide context to guide the translation style or continue from previous audio.
response = client.audio.translations.create(
  file: Pathname("spanish_interview.mp3"),
  model: "whisper-1",
  prompt: "This is an interview with a software engineer discussing AI",
  response_format: "verbose_json"
)

puts "Translation: #{response.text}"
puts "Original language: #{response.language}"
Translation always converts audio to English, regardless of the source language. Use transcription if you want to keep the original language.

Supported Audio Formats

The API supports the following audio file formats:
  • FLAC
  • MP3
  • MP4
  • MPEG
  • MPGA
  • M4A
  • OGG
  • WAV
  • WEBM
# Any supported format works
formats = ["audio.flac", "audio.mp3", "audio.wav", "audio.m4a"]

formats.each do |file|
  response = client.audio.transcriptions.create(
    file: Pathname(file),
    model: "gpt-4o-transcribe"
  )
  puts "#{file}: #{response.text}"
end

Using StringIO

Transcribe audio from memory without writing to disk.
require "stringio"

# Read audio file into memory
audio_data = File.read("audio.mp3")
audio_io = StringIO.new(audio_data)

response = client.audio.transcriptions.create(
  file: audio_io,
  model: "gpt-4o-transcribe"
)

puts response.text

Build docs developers (and LLMs) love