Skip to main content
The Audio API provides asynchronous audio processing: upload an audio file, transcribe it using Whisper, and format the transcript as structured meeting notes or conversation summaries.

Process Audio

curl -X POST "http://localhost:8000/api/audio/process" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]" \
  -F "instruction=Format as meeting notes with action items" \
  -F "whisper_model=base" \
  -F "async_mode=0"
{
  "transcript": "Welcome to today's meeting. We discussed the Q1 roadmap...",
  "formatted": "# Meeting Notes\n\n## Summary\nDiscussed Q1 roadmap and priorities.\n\n## Action Items\n- [ ] John to review design specs\n- [ ] Sarah to schedule follow-up"
}

Endpoint

POST /api/audio/process
Upload and process an audio file. Supports both synchronous (blocking) and asynchronous (polling) modes.

Form Parameters (multipart/form-data)

file
file
required
Audio file to transcribe (supports common formats: mp3, m4a, wav, etc.)
instruction
string
default:""
Formatting instruction for the AI (e.g., “Format as meeting notes with action items”, “Summarize the conversation”)
whisper_model
string
default:"base"
Whisper model to use for transcription (tiny, base, small, medium, large)
async_mode
string
default:"0"
  • 0 = Synchronous (wait for completion)
  • 1 = Asynchronous (returns job_id immediately with HTTP 202)

Response (Synchronous)

transcript
string
Raw transcription of the audio
formatted
string
AI-formatted output (meeting notes, summary, etc.)

Response (Async)

job_id
string
Unique job identifier for polling status
When async_mode=1, the endpoint returns 202 Accepted with a job_id. Poll GET /api/audio/status/{job_id} for progress.

Get Job Status

curl -X GET "http://localhost:8000/api/audio/status/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
{
  "stage": "transcribing"
}

Endpoint

GET /api/audio/status/{job_id}
Poll for async audio job progress.

Path Parameters

job_id
string
required
Job ID returned from POST /api/audio/process with async_mode=1

Response

stage
string
Current processing stage:
  • transcribing — Running Whisper transcription
  • formatting — AI is formatting the transcript
  • done — Processing complete
  • error — Job failed
transcript
string
Raw transcript (available after transcription completes)
formatted
string
Formatted output (available when stage=done)
error
string
Error message (present when stage=error)

Job Expiration

Jobs expire after 1 hour. Polling an expired job returns 404.

Async Processing Flow

  1. UploadPOST /api/audio/process with async_mode=1
  2. Receive Job ID — Get 202 Accepted with job_id
  3. Poll StatusGET /api/audio/status/{job_id} until stage=done
  4. Retrieve Results — Access transcript and formatted fields

Model Selection

Whisper models trade accuracy for speed:
ModelSpeedAccuracyUse Case
tinyFastestLowestQuick drafts
baseFastGoodDefault choice
smallMediumBetterImportant meetings
mediumSlowHighTechnical content
largeSlowestHighestCritical transcription

Build docs developers (and LLMs) love