Overview
TikTok Miner uses PostgreSQL with Prisma ORM for data management and TimescaleDB for time-series analytics. This guide covers database setup, schema overview, migrations, and seeding.
Prerequisites
PostgreSQL 14 or higher
Node.js 20+ or Bun 1.0+
Access to database with superuser privileges (for TimescaleDB)
Quick Start
1. Set Environment Variables
Configure database connection in app/.env:
PASSWORD = your-database-password
DATABASE_URL = "postgresql://tiktok_miner_user:${ PASSWORD }@localhost:5432/tiktok_miner?pgbouncer=true&connection_limit=1"
DIRECT_URL = "postgresql://tiktok_miner_user:${ PASSWORD }@localhost:5432/tiktok_miner"
2. Install Dependencies
3. Generate Prisma Client
4. Run Migrations
bunx prisma migrate deploy
5. Enable TimescaleDB (Optional)
bunx prisma db execute --file ./prisma/migrations/enable_timescaledb.sql
6. Seed Database (Optional)
bunx ts-node --transpile-only prisma/seed-creators.ts
Database Schema Overview
The schema is organized into several functional areas:
Core Models
CreatorProfile
Central model for creator information across all platforms:
model CreatorProfile {
id String @id @default ( uuid ())
// Basic Information
name String
email String ?
bio String ? @db.Text
profileImageUrl String ?
category String ? // tech, lifestyle, gaming, etc.
tags String [] @default ( [] )
isVerified Boolean @default ( false )
// Platform Identifiers
platformIdentifiers Json // JSON object with platform-specific IDs
// Calculated Metrics
compositeEngagementScore Float ?
totalReach Int @default ( 0 )
averageEngagementRate Float ?
contentFrequency Float ?
audienceQualityScore Float ?
// Status
lastSync DateTime @default ( now ())
syncStatus SyncStatus @default ( PENDING )
// Relations
youtubeMetrics YoutubeMetrics ?
twitterMetrics TwitterMetrics ?
instagramMetrics InstagramMetrics ?
tiktokMetrics TiktokMetrics ?
metricsHistory CreatorMetricsHistory []
engagementAnalytics EngagementAnalytics []
}
Key Features:
Unified creator identity across platforms
Composite metrics aggregated from all platforms
Flexible JSON storage for platform-specific identifiers
Quality scoring for audience authenticity
TikTokMetrics
model TiktokMetrics {
id String @id @default ( uuid ())
creatorProfileId String @unique
userId String @unique
// Basic Metrics
followerCount Int @default ( 0 )
followingCount Int @default ( 0 )
videoCount Int @default ( 0 )
heartCount BigInt @default ( 0 )
// Engagement Metrics
averageViews Int @default ( 0 )
averageLikes Int @default ( 0 )
averageComments Int @default ( 0 )
averageShares Int @default ( 0 )
engagementRate Float @default ( 0 )
// Profile Details
username String @unique
nickname String ?
profileUrl String
bio String ? @db.Text
isVerified Boolean @default ( false )
creatorProfile CreatorProfile @relation ( fields : [ creatorProfileId ], references : [ id ] )
}
YouTubeMetrics
Tracks YouTube channel statistics:
model YoutubeMetrics {
id String @id @default ( uuid ())
creatorProfileId String @unique
channelId String @unique
subscriberCount Int @default ( 0 )
videoCount Int @default ( 0 )
viewCount BigInt @default ( 0 )
averageViews Int @default ( 0 )
averageLikes Int @default ( 0 )
averageComments Int @default ( 0 )
engagementRate Float @default ( 0 )
channelName String
channelUrl String
description String ? @db.Text
country String ?
publishedAt DateTime ?
}
InstagramMetrics
model InstagramMetrics {
id String @id @default ( uuid ())
creatorProfileId String @unique
accountId String @unique
followerCount Int @default ( 0 )
followingCount Int @default ( 0 )
mediaCount Int @default ( 0 )
averageLikes Float @default ( 0 )
averageComments Float @default ( 0 )
engagementRate Float @default ( 0 )
username String @unique
fullName String ?
bio String ? @db.Text
isVerified Boolean @default ( false )
isBusinessAccount Boolean @default ( false )
// Business Insights
reach Int ? @default ( 0 )
impressions Int ? @default ( 0 )
profileViews Int ? @default ( 0 )
}
model TwitterMetrics {
id String @id @default ( uuid ())
creatorProfileId String @unique
userId String @unique
followerCount Int @default ( 0 )
followingCount Int @default ( 0 )
tweetCount Int @default ( 0 )
averageLikes Float @default ( 0 )
averageRetweets Float @default ( 0 )
averageReplies Float @default ( 0 )
engagementRate Float @default ( 0 )
username String @unique
displayName String
bio String ? @db.Text
isVerified Boolean @default ( false )
}
Time-Series Models (TimescaleDB)
CreatorMetricsHistory
Historical snapshots of creator metrics:
model CreatorMetricsHistory {
id String @id @default ( uuid ())
creatorProfileId String
platform String // youtube, twitter, instagram, tiktok
timestamp DateTime @default ( now ())
// Snapshot metrics
followerCount Int @default ( 0 )
engagementRate Float @default ( 0 )
totalPosts Int @default ( 0 )
avgLikes Float @default ( 0 )
avgComments Float @default ( 0 )
avgShares Float @default ( 0 )
avgViews Float @default ( 0 )
// Growth metrics
followerGrowth Int @default ( 0 )
engagementGrowth Float @default ( 0 )
platformMetrics Json ? // Platform-specific data
}
TimescaleDB Features:
Automatic partitioning by time
7-day chunk intervals
Compression after 30 days
1-year retention policy
EngagementAnalytics
Hourly engagement pattern tracking:
model EngagementAnalytics {
id String @id @default ( uuid ())
creatorProfileId String
timestamp DateTime @default ( now ())
// Hourly aggregates
hourlyLikes Int @default ( 0 )
hourlyComments Int @default ( 0 )
hourlyShares Int @default ( 0 )
hourlyViews Int @default ( 0 )
hourlyPosts Int @default ( 0 )
// Peak times
peakEngagementHour Int ?
peakEngagementDay Int ?
// Content performance
topPerformingContentIds String [] @default ( [] )
avgContentScore Float @default ( 0 )
audienceActivityPattern Json ?
}
API & Cost Management
ApiUsage
Tracks all API calls and costs:
model ApiUsage {
id String @id @default ( uuid ())
platform String // OpenAI, Anthropic, etc.
model String ? // gpt-4, claude-3, etc.
endpoint String
timestamp DateTime @default ( now ())
tokensUsed Int ? @default ( 0 )
cost Float @default ( 0 )
userId String ?
requestId String ? @unique
responseTime Int ? // milliseconds
statusCode Int ?
error String ? @db.Text
metadata Json ?
}
Budget Management
model Budget {
id String @id @default ( uuid ())
name String
budgetType BudgetType // MONTHLY, QUARTERLY, YEARLY
totalAmount Float
spentAmount Float @default ( 0 )
remainingAmount Float @default ( 0 )
startDate DateTime
endDate DateTime
status BudgetStatus @default ( ACTIVE )
budgetAllocations BudgetAllocation []
budgetAlerts BudgetAlert []
costAllocations CostAllocation []
}
Discovery Pipeline Models
InstagramProfile & TiktokProfile
Stores discovered creators with 30-day metrics:
model TiktokProfile {
username String @id
posts30d Int
likesTotal Int
commentsTotal Int
viewsTotal BigInt
sharesTotal Int
nickName String ?
avatar String ?
signature String ? @db.Text
verified Boolean @default ( false )
followerCount Int @default ( 0 )
// Calculated metrics
avgLikesPerPost Float @default ( 0 )
avgCommentsPerPost Float @default ( 0 )
avgViewsPerPost Float @default ( 0 )
engagementRate Float @default ( 0 )
category String ? // Discovery keyword
lastUpdated DateTime @default ( now ()) @updatedAt
}
Prisma Configuration
Schema Location
app/prisma/schema.prisma
Generator Configuration
generator client {
provider = "prisma-client-js"
}
datasource db {
provider = "postgresql"
url = env ( "DATABASE_URL" )
directUrl = env ( "DIRECT_URL" )
}
Key Points:
DATABASE_URL: Used for connection pooling (Supabase pooler, PgBouncer)
DIRECT_URL: Direct connection for migrations
Database Migrations
Creating Migrations
During development:
# Create migration from schema changes
bunx prisma migrate dev --name description_of_changes
Applying Migrations
In production:
# Apply all pending migrations
bunx prisma migrate deploy
Available Migrations
The project includes these migrations:
add_creator_models.sql - Core creator profile tables
add_api_usage_tracking.sql - API monitoring tables
add_webhook_event_model.sql - Webhook processing
add_performance_indexes.sql - Query optimization indexes
enable_timescaledb.sql - TimescaleDB setup and hypertables
Migration Best Practices
Always backup your database before running migrations in production!
# Backup before migration
pg_dump -U tiktok_miner_user tiktok_miner > backup_ $( date +%Y%m%d ) .sql
# Apply migrations
bunx prisma migrate deploy
# Verify schema
bunx prisma db pull --print
TimescaleDB Setup
Benefits
Automatic partitioning : Data automatically partitioned by time
Compression : Older data compressed to save space
Continuous aggregates : Pre-computed rollups for fast queries
Retention policies : Automatic data cleanup
Installation
TimescaleDB must be enabled on your PostgreSQL instance:
CREATE EXTENSION IF NOT EXISTS timescaledb;
Hypertable Conversion
Run the TimescaleDB setup:
bunx prisma db execute --file ./prisma/migrations/enable_timescaledb.sql
This script:
Enables TimescaleDB extension
Converts tables to hypertables:
CreatorMetricsHistory (7-day chunks)
EngagementAnalytics (1-day chunks)
Sets up compression policies
Creates continuous aggregates
Configures retention policies
Continuous Aggregates
Daily Creator Metrics
CREATE MATERIALIZED VIEW creator_metrics_daily
WITH ( timescaledb . continuous ) AS
SELECT
creatorProfileId,
platform,
time_bucket( '1 day' , timestamp ) AS day ,
AVG (followerCount) AS avg_followers,
AVG (engagementRate) AS avg_engagement_rate,
SUM (followerGrowth) AS total_follower_growth
FROM "CreatorMetricsHistory"
GROUP BY creatorProfileId, platform, day ;
Hourly Engagement Patterns
CREATE MATERIALIZED VIEW engagement_patterns_hourly
WITH ( timescaledb . continuous ) AS
SELECT
creatorProfileId,
time_bucket( '1 hour' , timestamp ) AS hour ,
SUM (hourlyLikes) AS total_likes,
SUM (hourlyComments) AS total_comments,
AVG (avgContentScore) AS avg_content_score
FROM "EngagementAnalytics"
GROUP BY creatorProfileId, hour ;
Custom Analytics Functions
Calculate Growth Rate
SELECT calculate_growth_rate(
'creator-id' ,
'tiktok' ,
'followerCount' ,
INTERVAL '7 days'
);
Get Trending Creators
SELECT * FROM get_trending_creators(
p_platform : = 'tiktok' ,
p_days : = 7 ,
p_limit : = 10
);
Seeding the Database
Seed Script
The seed script generates realistic test data:
bunx ts-node --transpile-only prisma/seed-creators.ts
What Gets Created
50 creator profiles with varying popularity levels
Multiple platform presences per creator (2-4 platforms)
Platform-specific metrics for each active platform
30 days of historical data for trending analysis
Engagement analytics with realistic patterns
Customizing Seed Data
Edit prisma/seed-creators.ts:
// Change number of creators
const NUM_CREATORS = 100 ;
// Modify categories
const CATEGORIES = [ 'tech' , 'lifestyle' , 'gaming' , ... ];
// Adjust follower ranges
followers : Math . floor (
faker . number . int ({ min: 10000 , max: 1000000 }) * popularityFactor
)
Database Maintenance
Vacuum and Analyze
Regular maintenance for optimal performance:
-- Vacuum all tables
VACUUM ANALYZE;
-- Specific table
VACUUM ANALYZE "CreatorProfile" ;
Reindex
-- Reindex all tables
REINDEX DATABASE tiktok_miner;
-- Specific table
REINDEX TABLE "CreatorProfile" ;
Monitor Table Sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC ;
Check Index Usage
SELECT
schemaname,
tablename,
indexname,
idx_scan as index_scans
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC ;
Automated Setup Script
Use the provided setup script:
cd app
chmod +x scripts/setup-creator-database.sh
./scripts/setup-creator-database.sh
This script:
Checks dependencies
Generates Prisma client
Creates and applies migrations
Sets up TimescaleDB (if available)
Optionally seeds the database
Verifies the setup
Backup and Restore
Full Database Backup
pg_dump -U tiktok_miner_user -F c tiktok_miner > backup.dump
Schema-Only Backup
pg_dump -U tiktok_miner_user -s tiktok_miner > schema.sql
Data-Only Backup
pg_dump -U tiktok_miner_user -a tiktok_miner > data.sql
Restore Database
pg_restore -U tiktok_miner_user -d tiktok_miner backup.dump
Docker Backup
# Backup
docker-compose exec postgres pg_dump -U tiktok_miner_user tiktok_miner > backup.sql
# Restore
docker-compose exec -T postgres psql -U tiktok_miner_user tiktok_miner < backup.sql
Troubleshooting
Migration Errors
Migration already applied
Check migration status: bunx prisma migrate status
Reset if needed (WARNING: deletes all data): bunx prisma migrate reset
Connection pool exhausted
Using connection pooling? Ensure connection_limit=1 in DATABASE_URL: DATABASE_URL = "postgresql://user:pass@host:5432/db?pgbouncer=true&connection_limit=1"
TimescaleDB extension not found
Install TimescaleDB on your PostgreSQL server: # Ubuntu/Debian
sudo apt-get install timescaledb-2-postgresql-14
# Or use Timescale Cloud / Supabase with TimescaleDB enabled
Enable query logging in PostgreSQL: ALTER DATABASE tiktok_miner SET log_statement = 'all' ;
ALTER DATABASE tiktok_miner SET log_min_duration_statement = 100 ; -- ms
Check slow queries: SELECT * FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10 ;
Verify indexes are being used: EXPLAIN ANALYZE
SELECT * FROM "CreatorProfile" WHERE category = 'tech' ;
Add missing indexes if needed.
Next Steps
Environment Variables Configure database connection strings
Docker Deployment Deploy with Docker and PostgreSQL container