Process Audio
Endpoint
Form Parameters (multipart/form-data)
Audio file to transcribe (supports common formats: mp3, m4a, wav, etc.)
Formatting instruction for the AI (e.g., “Format as meeting notes with action items”, “Summarize the conversation”)
Whisper model to use for transcription (
tiny, base, small, medium, large)0= Synchronous (wait for completion)1= Asynchronous (returnsjob_idimmediately with HTTP 202)
Response (Synchronous)
Raw transcription of the audio
AI-formatted output (meeting notes, summary, etc.)
Response (Async)
Unique job identifier for polling status
async_mode=1, the endpoint returns 202 Accepted with a job_id. Poll GET /api/audio/status/{job_id} for progress.
Get Job Status
Endpoint
Path Parameters
Job ID returned from
POST /api/audio/process with async_mode=1Response
Current processing stage:
transcribing— Running Whisper transcriptionformatting— AI is formatting the transcriptdone— Processing completeerror— Job failed
Raw transcript (available after transcription completes)
Formatted output (available when
stage=done)Error message (present when
stage=error)Job Expiration
Jobs expire after 1 hour. Polling an expired job returns404.
Async Processing Flow
- Upload —
POST /api/audio/processwithasync_mode=1 - Receive Job ID — Get
202 Acceptedwithjob_id - Poll Status —
GET /api/audio/status/{job_id}untilstage=done - Retrieve Results — Access
transcriptandformattedfields
Model Selection
Whisper models trade accuracy for speed:| Model | Speed | Accuracy | Use Case |
|---|---|---|---|
tiny | Fastest | Lowest | Quick drafts |
base | Fast | Good | Default choice |
small | Medium | Better | Important meetings |
medium | Slow | High | Technical content |
large | Slowest | Highest | Critical transcription |