Text-to-Speech Generation

Overview

Resonance’s text-to-speech engine converts written text into natural-sounding audio using AI-powered voice synthesis. The generation system supports both system and custom voices with fine-tuned control over speech characteristics.

Generation Parameters

The TTS engine accepts several parameters to control the output audio quality and characteristics:

text

string

required

The input text to convert to speech

Maximum length: 5,000 characters
Minimum length: 1 character
Text is automatically trimmed before processing

voiceId

string

required

The unique identifier of the voice to use

Can be a system voice or custom voice ID
Must belong to your organization (for custom voices)

temperature

number

default:"0.8"

Controls creativity and expressiveness

Range: 0.0 - 2.0
Lower values (0-0.5): More consistent and predictable
Higher values (1.5-2.0): More expressive and varied
UI Label: “Creativity” (Consistent ↔ Expressive)

topP

number

default:"0.95"

Controls voice variety and stability

Range: 0.0 - 1.0
Step: 0.05
Lower values: More stable output
Higher values: More dynamic variation
UI Label: “Voice Variety” (Stable ↔ Dynamic)

topK

number

default:"1000"

Controls expression range

Range: 1 - 10,000
Step: 100
Lower values: Subtle expressions
Higher values: Dramatic delivery
UI Label: “Expression Range” (Subtle ↔ Dramatic)

repetitionPenalty

number

default:"1.2"

Controls natural flow and rhythm

Range: 1.0 - 2.0
Step: 0.1
Lower values: More rhythmic patterns
Higher values: More varied delivery
UI Label: “Natural Flow” (Rhythmic ↔ Varied)

Generation Workflow

Enter text

Input your text in the text editor. The text must be between 1 and 5,000 characters.

Select a voice

Choose from available system voices or your custom cloned voices.

Adjust parameters (optional)

Fine-tune the generation settings using the parameter sliders:

Creativity (temperature)
Voice Variety (topP)
Expression Range (topK)
Natural Flow (repetitionPenalty)

Generate audio

Click the “Generate” button. The system will:

Validate your subscription status
Verify the selected voice exists and is accessible
Send the request to the Chatterbox TTS engine
Store the generated audio in cloud storage
Record the generation in your history

Listen and download

Once generation completes, you’ll be redirected to the detail view where you can:

Play the audio with the WaveSurfer player
Download the audio file
Regenerate with different settings

API Usage

Generate audio programmatically using the tRPC API:

import { trpc } from '@/trpc/client';

const generation = await trpc.generations.create.mutate({
  text: "Welcome to Resonance, the future of voice synthesis.",
  voiceId: "clx1234567890",
  temperature: 0.8,
  topP: 0.95,
  topK: 1000,
  repetitionPenalty: 1.2,
});

console.log('Generation ID:', generation.id);
// Navigate to: /text-to-speech/{generation.id}

Subscription Requirements

Text-to-speech generation requires an active subscription. Users without a subscription will receive a SUBSCRIPTION_REQUIRED error.

The system checks for active subscriptions before processing generation requests:

const customerState = await polar.customers.getStateExternal({
  externalId: orgId,
});
const hasActiveSubscription = (customerState.activeSubscriptions ?? []).length > 0;

Usage Tracking

Each generation is tracked for billing purposes:

Event name: tts_generation
Metadata: Character count of input text
Cost: $0.0003 per unit
Usage events are sent to Polar for metering

Response Format

Successful generation returns:

{
  id: string; // Unique generation ID
}

The audio can be accessed via:

Detail page: /text-to-speech/{id}
Audio URL: /api/audio/{id}

Error Handling

SUBSCRIPTION_REQUIRED

User does not have an active subscription. Show checkout prompt.

NOT_FOUND

The selected voice ID does not exist or is not accessible by your organization.

PRECONDITION_FAILED

Voice exists but audio file is not available (missing r2ObjectKey).

INTERNAL_SERVER_ERROR

Generation failed or audio storage failed. The system automatically cleans up partial data.

Audio Processing

The TTS engine applies several post-processing steps:

Loudness normalization: All audio is normalized using norm_loudness: true
Format: Audio is generated in WAV format
Storage: Audio files are stored in R2 with the key pattern: generations/orgs/{orgId}/{generationId}

Generated audio is stored permanently until manually deleted. Each generation preserves the original parameters and voice name for reproducibility.

Get Started

Core Features

Configuration

Deployment

Text-to-Speech Generation

Overview

Generation Parameters

Generation Workflow

API Usage

Subscription Requirements

Usage Tracking

Response Format

Error Handling

Audio Processing

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Deployment

​Overview

​Generation Parameters

​Generation Workflow

​API Usage

​Subscription Requirements

​Usage Tracking

​Response Format

​Error Handling

​Audio Processing

Build docs developers (and LLMs) love

Overview

Generation Parameters

Generation Workflow

API Usage

Subscription Requirements

Usage Tracking

Response Format

Error Handling

Audio Processing