System Overview
Jefftube is a full-stack web application built as a monorepo with three independent components that work together to create a video streaming platform.Component Architecture
1. Frontend (web/)
The frontend is a modern React application built with performance and user experience in mind.Tech Stack
| Technology | Version | Purpose |
|---|---|---|
| React | 19.2.0 | UI framework |
| React Router | 7.13.0 | Client-side routing |
| Vite | 7.2.4 | Build tool and dev server |
| TailwindCSS | 4.1.18 | Utility-first styling |
| TanStack Query | 5.90.20 | Server state management |
| PostHog | 1.336.4 | Analytics and feature flags |
| TypeScript | 5.9.3 | Type safety |
Application Structure
Routing Structure
Defined inApp.tsx:58-64:
State Management
The frontend uses a hybrid state management approach:- Server State
- UI State
- Local State
TanStack Query manages all server-side data:
- Video metadata fetching
- Comments and replies
- User interactions (likes, comments)
- Automatic caching and refetching
main.tsx:13-20:Provider Hierarchy
Frommain.tsx:27-44:
2. Backend (server/)
The backend is a lightweight, fast API server built with Hono and Bun.Tech Stack
| Technology | Version | Purpose |
|---|---|---|
| Bun | Latest | JavaScript runtime |
| Hono | 4.11.7 | Web framework |
| Drizzle ORM | 0.38.0 | Type-safe ORM |
| PostgreSQL | 16 | Relational database |
| Pino | 10.3.0 | Structured logging |
| postgres | 3.4.5 | PostgreSQL client |
Server Architecture
Middleware Stack
Fromindex.ts:14-23:
API Routes
All routes are prefixed with/api and defined in index.ts:32-35:
| Route | File | Endpoints |
|---|---|---|
/api/users/* | routes/users.ts | User registration, profile |
/api/videos/* | routes/videos.ts | Video list, metadata |
/api/comments/* | routes/comments.ts | CRUD for comments |
/api/videoLikes/* | routes/videoLikes.ts | Like/unlike videos |
Database Layer
Drizzle ORM provides type-safe database access:drizzle.config.ts:3-10:
Error Handling
Centralized error handler inindex.ts:42-45:
3. Scraper (scrapper/)
The scraper is a Playwright-based automation tool that extracts video metadata from government websites.Tech Stack
| Technology | Version | Purpose |
|---|---|---|
| Playwright | 1.58.1 | Browser automation |
| Bun | Latest | Runtime and file I/O |
How It Works
Concurrent Scraping
Uses configurable concurrency (default 5 pages in parallel) to efficiently scrape thousands of pages.Features:
- Automatic robot check handling
- Progress tracking with resume capability
- Failed page retry mechanism
- Staggered request timing to avoid rate limits
Command Line Interface
The scraper supports extensive CLI options (fromscraper.ts:44-88):
| Flag | Description |
|---|---|
--dataset <id> | Scrape specific dataset (9, 10, or 11) |
--visible | Run browser in visible mode |
--max-pages <n> | Limit number of pages to scrape |
--concurrency <n> | Number of parallel pages (default 5) |
--retry-failed | Retry previously failed pages |
--clear-progress | Start fresh, clearing progress |
--sequential | Disable parallel scraping |
--debug | Enable debug logging |
Data Flow
1. Data Ingestion Flow
2. API Request Flow
3. Comment Submission Flow
Performance Optimizations
Frontend
- Code Splitting: Automatic route-based splitting via React Router
- Query Caching: 5-minute stale time for TanStack Query
- Image Optimization: Lazy loading for video thumbnails
- CSS: Utility-first TailwindCSS with minimal runtime
Backend
- Bun Runtime: ~3x faster than Node.js for I/O operations
- Connection Pooling: PostgreSQL connection reuse
- Rate Limiting: Prevents API abuse (50 req/min per IP)
- Indexed Queries: Database indexes on frequently queried fields
Database
- Composite Indexes: Optimized multi-column queries
- Foreign Key Cascades: Automatic cleanup of related data
- Unique Constraints: Prevent duplicate entries at DB level
Security
Rate Limiting
50 requests per minute per IP address on all API endpoints
CORS
Configured to allow cross-origin requests from the frontend
reCAPTCHA
Google reCAPTCHA v3 on comment submissions to prevent spam
SQL Injection Prevention
Drizzle ORM provides parameterized queries
IP Hashing
User IP addresses are hashed (not stored in plain text)
Cascading Deletes
Foreign key cascades ensure data integrity
Deployment Architecture
This section describes a typical production deployment. Adjust based on your infrastructure.
- Frontend: Static files served via CDN (Vercel, Cloudflare Pages, etc.)
- Backend: Bun server on a VPS or container platform
- Database: Managed PostgreSQL (AWS RDS, Supabase, Railway, etc.)
- Storage: Object storage for video files (S3, R2, etc.)
Next Steps
Setup Guide
Get your development environment running
Database Schema
Explore the database structure in detail