Audio API

The Audio API provides asynchronous audio processing: upload an audio file, transcribe it using Whisper, and format the transcript as structured meeting notes or conversation summaries.

Process Audio

curl -X POST "http://localhost:8000/api/audio/process" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]" \
  -F "instruction=Format as meeting notes with action items" \
  -F "whisper_model=base" \
  -F "async_mode=0"

{
  "transcript": "Welcome to today's meeting. We discussed the Q1 roadmap...",
  "formatted": "# Meeting Notes\n\n## Summary\nDiscussed Q1 roadmap and priorities.\n\n## Action Items\n- [ ] John to review design specs\n- [ ] Sarah to schedule follow-up"
}

Endpoint

POST /api/audio/process

Upload and process an audio file. Supports both synchronous (blocking) and asynchronous (polling) modes.

Form Parameters (multipart/form-data)

file

required

Audio file to transcribe (supports common formats: mp3, m4a, wav, etc.)

instruction

string

default:""

Formatting instruction for the AI (e.g., “Format as meeting notes with action items”, “Summarize the conversation”)

whisper_model

string

default:"base"

Whisper model to use for transcription (tiny, base, small, medium, large)

async_mode

string

default:"0"

0 = Synchronous (wait for completion)
1 = Asynchronous (returns job_id immediately with HTTP 202)

Response (Synchronous)

transcript

string

Raw transcription of the audio

formatted

string

AI-formatted output (meeting notes, summary, etc.)

Response (Async)

job_id

string

Unique job identifier for polling status

When async_mode=1, the endpoint returns 202 Accepted with a job_id. Poll GET /api/audio/status/{job_id} for progress.

Get Job Status

curl -X GET "http://localhost:8000/api/audio/status/a1b2c3d4-e5f6-7890-abcd-ef1234567890"

{
  "stage": "transcribing"
}

Endpoint

GET /api/audio/status/{job_id}

Poll for async audio job progress.

Path Parameters

job_id

string

required

Job ID returned from POST /api/audio/process with async_mode=1

Response

stage

string

Current processing stage:

transcribing — Running Whisper transcription
formatting — AI is formatting the transcript
done — Processing complete
error — Job failed

transcript

string

Raw transcript (available after transcription completes)

formatted

string

Formatted output (available when stage=done)

error

string

Error message (present when stage=error)

Job Expiration

Jobs expire after 1 hour. Polling an expired job returns 404.

Async Processing Flow

Upload — POST /api/audio/process with async_mode=1
Receive Job ID — Get 202 Accepted with job_id
Poll Status — GET /api/audio/status/{job_id} until stage=done
Retrieve Results — Access transcript and formatted fields

Model Selection

Whisper models trade accuracy for speed:

Model	Speed	Accuracy	Use Case
`tiny`	Fastest	Lowest	Quick drafts
`base`	Fast	Good	Default choice
`small`	Medium	Better	Important meetings
`medium`	Slow	High	Technical content
`large`	Slowest	Highest	Critical transcription

Core

Features

Settings

Process Audio

Endpoint

Form Parameters (multipart/form-data)

Response (Synchronous)

Response (Async)

Get Job Status

Endpoint

Path Parameters

Response

Job Expiration

Async Processing Flow

Model Selection

Build docs developers (and LLMs) love

Core

Features

Settings

​Process Audio

​Endpoint

​Form Parameters (multipart/form-data)

​Response (Synchronous)

​Response (Async)

​Get Job Status

​Endpoint

​Path Parameters

​Response

​Job Expiration

​Async Processing Flow

​Model Selection

Build docs developers (and LLMs) love

Process Audio

Endpoint

Form Parameters (multipart/form-data)

Response (Synchronous)

Response (Async)

Get Job Status

Endpoint

Path Parameters

Response

Job Expiration

Async Processing Flow

Model Selection