Overview
Resonance’s text-to-speech engine converts written text into natural-sounding audio using AI-powered voice synthesis. The generation system supports both system and custom voices with fine-tuned control over speech characteristics.Generation Parameters
The TTS engine accepts several parameters to control the output audio quality and characteristics:The input text to convert to speech
- Maximum length: 5,000 characters
- Minimum length: 1 character
- Text is automatically trimmed before processing
The unique identifier of the voice to use
- Can be a system voice or custom voice ID
- Must belong to your organization (for custom voices)
Controls creativity and expressiveness
- Range: 0.0 - 2.0
- Lower values (0-0.5): More consistent and predictable
- Higher values (1.5-2.0): More expressive and varied
- UI Label: “Creativity” (Consistent ↔ Expressive)
Controls voice variety and stability
- Range: 0.0 - 1.0
- Step: 0.05
- Lower values: More stable output
- Higher values: More dynamic variation
- UI Label: “Voice Variety” (Stable ↔ Dynamic)
Controls expression range
- Range: 1 - 10,000
- Step: 100
- Lower values: Subtle expressions
- Higher values: Dramatic delivery
- UI Label: “Expression Range” (Subtle ↔ Dramatic)
Controls natural flow and rhythm
- Range: 1.0 - 2.0
- Step: 0.1
- Lower values: More rhythmic patterns
- Higher values: More varied delivery
- UI Label: “Natural Flow” (Rhythmic ↔ Varied)
Generation Workflow
Adjust parameters (optional)
Fine-tune the generation settings using the parameter sliders:
- Creativity (temperature)
- Voice Variety (topP)
- Expression Range (topK)
- Natural Flow (repetitionPenalty)
Generate audio
Click the “Generate” button. The system will:
- Validate your subscription status
- Verify the selected voice exists and is accessible
- Send the request to the Chatterbox TTS engine
- Store the generated audio in cloud storage
- Record the generation in your history
API Usage
Generate audio programmatically using the tRPC API:Subscription Requirements
The system checks for active subscriptions before processing generation requests:Usage Tracking
Each generation is tracked for billing purposes:- Event name:
tts_generation - Metadata: Character count of input text
- Cost: $0.0003 per unit
- Usage events are sent to Polar for metering
Response Format
Successful generation returns:- Detail page:
/text-to-speech/{id} - Audio URL:
/api/audio/{id}
Error Handling
SUBSCRIPTION_REQUIRED
SUBSCRIPTION_REQUIRED
User does not have an active subscription. Show checkout prompt.
NOT_FOUND
NOT_FOUND
The selected voice ID does not exist or is not accessible by your organization.
PRECONDITION_FAILED
PRECONDITION_FAILED
Voice exists but audio file is not available (missing r2ObjectKey).
INTERNAL_SERVER_ERROR
INTERNAL_SERVER_ERROR
Generation failed or audio storage failed. The system automatically cleans up partial data.
Audio Processing
The TTS engine applies several post-processing steps:- Loudness normalization: All audio is normalized using
norm_loudness: true - Format: Audio is generated in WAV format
- Storage: Audio files are stored in R2 with the key pattern:
generations/orgs/{orgId}/{generationId}
Generated audio is stored permanently until manually deleted. Each generation preserves the original parameters and voice name for reproducibility.