System Overview
Tech Stack
Frontend
Next.js 16
Server-side rendering, App Router, and React Server Components for optimal performance. The app uses the
(dashboard) route group for authenticated pages.React 19
Latest React with concurrent features, automatic batching, and improved streaming SSR.
Tailwind CSS 4
Utility-first CSS with JIT compilation. Custom components built with shadcn/ui and Radix UI primitives.
WaveSurfer.js
Audio waveform visualization with seek, play/pause, and download capabilities.
Backend
tRPC 11
End-to-end typesafe APIs without code generation. Three main routers:
voices, generations, and billing.Prisma 7
Type-safe database ORM with PostgreSQL adapter. Supports connection pooling and edge deployments.
Clerk
Authentication and multi-tenant organization management with full data isolation per org.
Cloudflare R2
S3-compatible object storage for voice samples and generated audio. Generous free tier (10GB).
AI Infrastructure
Chatterbox TTS
Open-source zero-shot voice cloning model by Resemble AI. Supports emotional tags like
[chuckle], [sigh], etc.Modal
Serverless GPU infrastructure (NVIDIA A10G). Pay-per-second billing with automatic scaling and cold start optimization.
Database Schema
Resonance uses PostgreSQL with Prisma ORM. The schema is simple but powerful:Voice Model
Voice Categories
Voice Categories
AUDIOBOOK- Narrative voices for long-form contentCONVERSATIONAL- Natural dialogue voicesCUSTOMER_SERVICE- Professional support voicesGENERAL- All-purpose voicesNARRATIVE- Storytelling voicesCHARACTERS- Character and role-play voicesMEDITATION- Calm, soothing voicesMOTIVATIONAL- Energetic, inspiring voicesPODCAST- Casual, engaging voicesADVERTISING- Promotional voicesVOICEOVER- Professional VO workCORPORATE- Business and training voices
Generation Model
The
voiceName field is denormalized to preserve generation history even if the source voice is deleted. The voice relation uses onDelete: SetNull to prevent cascade deletion.API Layer (tRPC)
Resonance uses tRPC for type-safe API communication. All routers are composed insrc/trpc/routers/_app.ts:
src/trpc/routers/_app.ts
Key Procedures
generations.create - Generate TTS Audio
generations.create - Generate TTS Audio
The most complex procedure. It:
- Validates the user has an active Polar subscription
- Fetches the voice from the database (system or custom)
- Calls the Chatterbox API on Modal with synthesis parameters
- Receives the generated audio as an ArrayBuffer
- Uploads the audio to R2 with org-scoped key:
generations/orgs/{orgId}/{generationId} - Saves generation metadata to the database
- Ingests a usage event to Polar for billing (fire-and-forget)
- Returns the generation ID to the client
src/trpc/routers/generations.ts (excerpt)
voices.create - Clone a Voice
voices.create - Clone a Voice
Handles voice cloning:
- Validates the uploaded audio (minimum 10 seconds)
- Generates a unique voice ID with
cuid() - Uploads the audio sample to R2:
voices/custom/{orgId}/{voiceId} - Creates the voice record in the database with
variant: CUSTOM - Returns the new voice to the client
voices.getAll - List Available Voices
voices.getAll - List Available Voices
Returns all voices accessible to the organization:
- All
SYSTEMvoices (shared across all orgs) CUSTOMvoices whereorgIdmatches the current org
OR clause for efficient querying:billing.getUsage - Fetch Usage Metrics
billing.getUsage - Fetch Usage Metrics
Queries Polar’s usage API to get character consumption for the current billing period. Used to display usage in the dashboard.
Object Storage (R2)
Cloudflare R2 stores all audio files with a consistent key structure:R2 Client Implementation
The R2 client is a thin wrapper around AWS S3 SDK:src/lib/r2.ts
R2 is S3-compatible, so you can easily swap it for AWS S3, Backblaze B2, or MinIO by changing the endpoint configuration.
TTS Generation Flow
Here’s the complete flow when a user generates speech:User submits TTS form
Client calls
trpc.generations.create.mutate() with text, voice ID, and synthesis parameters.Subscription check
The procedure queries Polar to verify the organization has an active subscription. Throws
SUBSCRIPTION_REQUIRED error if not.Voice lookup
Fetches the voice from the database, ensuring it’s either a system voice or a custom voice owned by the org. Validates the
r2ObjectKey exists.Call Chatterbox API
Makes a POST request to the Modal endpoint:The Modal function:
- Mounts the R2 bucket read-only
- Reads the voice reference audio from R2
- Loads the Chatterbox model (cached after first run)
- Generates speech using the zero-shot voice cloning
- Returns WAV audio as bytes
Store generation
- Creates a database record with all parameters
- Uploads the audio buffer to R2 with key
generations/orgs/{orgId}/{generationId} - Updates the database record with the R2 key
Multi-Tenancy with Clerk
Resonance uses Clerk Organizations for multi-tenancy:- Each user belongs to one or more organizations
- All tRPC procedures use the
orgProcedurehelper which injectsctx.orgId - Database queries are automatically scoped to the current org
- R2 keys include org IDs for data isolation
src/trpc/init.ts (simplified)
Billing with Polar
Resonance uses Polar for usage-based billing:- Products - Define pricing tiers (e.g., $0.10 per 1000 characters)
- Meters - Track usage events (
tts_generationwith character count) - Subscriptions - Link customers to products
- Invoices - Generated automatically based on metered usage
billing router provides a getUsage procedure that queries Polar’s API to display current period consumption in the dashboard.
Polar supports both sandbox and production modes. Use sandbox for development with test cards.
Performance Considerations
Cold Starts
Cold Starts
Modal GPU containers have 10-15 second cold starts. After the first request, containers stay warm for ~10 minutes. Consider implementing a keep-alive ping for production.
Database Connection Pooling
Database Connection Pooling
Use Prisma Accelerate or PgBouncer for connection pooling in serverless environments. The
@prisma/adapter-pg package supports this out of the box.R2 Bandwidth
R2 Bandwidth
R2 has no egress fees, making it ideal for serving audio files. Use signed URLs with 1-hour expiry to prevent hotlinking.
Audio Streaming
Audio Streaming
The current implementation loads full audio files. For long-form content (>5 minutes), consider implementing range-request streaming in the audio proxy route.
Project Structure
Security
Authentication
Clerk handles all auth with industry-standard security. Sessions are validated on every request.
Authorization
All database queries include org ID checks. The audio proxy validates ownership before serving files.
API Security
The Modal endpoint requires an API key via the
x-api-key header. Never expose this key to the client.Data Isolation
Each org’s data is stored in separate R2 paths and filtered by
orgId in all queries.Next Steps
Self-Hosting Guide
Deploy Resonance to production
API Reference
Explore all tRPC endpoints
Configuration
Customize environment variables and settings
Project Structure
Understand the codebase organization