Audio Streaming
Get Generation Audio
Retrieve the generated audio file for a specific generation.The unique identifier of the generation
- Content-Type:
audio/wav - Cache-Control:
private, max-age=3600 - Body: Audio file binary data
200- Success, audio returned401- Unauthorized (no valid session)404- Generation not found or doesn’t belong to your organization409- Audio not yet available (generation still processing)502- Failed to fetch audio from R2 storage
Get Voice Audio
Retrieve the reference audio file for a specific voice (system or custom).The unique identifier of the voice
- Content-Type: Varies (e.g.,
audio/wav,audio/mpeg) - Cache-Control:
- System voices:
public, max-age=86400(24 hours) - Custom voices:
private, max-age=3600(1 hour)
- System voices:
- Body: Audio file binary data
200- Success, audio returned401- Unauthorized (no valid session)404- Voice not found or custom voice doesn’t belong to your organization409- Voice audio not yet available (upload still processing)502- Failed to fetch audio from R2 storage
Voice Upload
Create Custom Voice
Upload an audio file to create a custom voice clone.Display name for the voice (minimum 1 character)
Voice category. Must be one of:
AUDIOBOOK, CONVERSATIONAL, CUSTOMER_SERVICE, GENERAL, NARRATIVE, CHARACTERS, MEDITATION, MOTIVATIONAL, PODCAST, ADVERTISING, VOICEOVER, CORPORATEBCP 47 language code (e.g.,
en-US, es-ES, fr-FR)Optional description of the voice
Content-Type: The MIME type of the audio file (e.g.,audio/wav,audio/mpeg,audio/mp3,audio/ogg)
- Maximum file size: 20 MB
- Minimum duration: 10 seconds
- Subscription required: Voice creation requires an active Polar subscription
The name of the created voice
Success message
201- Voice created successfully400- Invalid input (missing parameters, invalid category, missing Content-Type, or empty file)401- Unauthorized (no valid session)403- Subscription required (SUBSCRIPTION_REQUIREDerror)413- File too large (exceeds 20 MB)422- Invalid audio file or duration too short500- Server error during voice creation
src/app/api/voices/create/route.ts:10-18):
music-metadata to:
- Confirm the file is valid audio
- Extract duration metadata
- Ensure duration meets the 10-second minimum
voices/orgs/{orgId}/{voiceId}
Usage Metering:
Voice creation events are tracked in Polar with the event name voice_creation.
Error Handling
All endpoints return JSON error responses with this structure:/api/voices/create, the response includes detailed issues:
Security
All endpoints require authentication via Clerk session cookies. Custom voices and generations are scoped to the authenticated user’s organization (orgId), ensuring data isolation in the multi-tenant architecture.
System voices (built-in) are publicly accessible to all authenticated users.
Related Documentation
tRPC Routers
Voice and generation management via tRPC
Voice Cloning
Learn how to clone voices
Cloudflare R2
Configure audio storage
Clerk Auth
Set up authentication