Architecture - Jefftube

System Overview

Jefftube is a full-stack web application built as a monorepo with three independent components that work together to create a video streaming platform.

Component Architecture

1. Frontend (web/)

The frontend is a modern React application built with performance and user experience in mind.

Tech Stack

Technology	Version	Purpose
React	19.2.0	UI framework
React Router	7.13.0	Client-side routing
Vite	7.2.4	Build tool and dev server
TailwindCSS	4.1.18	Utility-first styling
TanStack Query	5.90.20	Server state management
PostHog	1.336.4	Analytics and feature flags
TypeScript	5.9.3	Type safety

Application Structure

web/src/
├── App.tsx                 # Root component with routing
├── main.tsx               # Application entry point
├── components/
│   ├── layout/           # Header, Sidebar
│   ├── video/            # Video player and related components
│   ├── comments/         # Comment section and components
│   ├── ui/              # Reusable UI components (Avatar, Dialog)
│   └── icons/           # Icon components
├── pages/
│   ├── channel/         # Main channel page
│   ├── shorts/          # Shorts feed page
│   └── playlist/        # Playlist viewer page
├── hooks/               # Custom React hooks
├── contexts/            # React Context providers
│   ├── ThemeContext    # Dark/light mode
│   ├── DataContext     # Video data management
│   └── CaptchaContext  # reCAPTCHA integration
└── utils/              # Utility functions

Routing Structure

Defined in App.tsx:58-64:

<Routes>
  <Route path="/" element={<ChannelPage />} />
  <Route path="/watch/:videoId" element={<VideoPage />} />
  <Route path="/shorts" element={<ShortsPage />} />
  <Route path="/playlist/:playlistId/:videoId" element={<PlaylistVideoPage />} />
  <Route path="*" element={<NotFoundPage />} />
</Routes>

State Management

The frontend uses a hybrid state management approach:

Server State
UI State
Local State

TanStack Query manages all server-side data:

Video metadata fetching
Comments and replies
User interactions (likes, comments)
Automatic caching and refetching

Configuration in main.tsx:13-20:

const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      staleTime: 1000 * 60 * 5, // 5 minutes
      refetchOnWindowFocus: false,
    },
  },
});

React Context manages UI state:

Theme (dark/light mode) - ThemeContext
Video data - DataContext
reCAPTCHA tokens - CaptchaContext

Provider Hierarchy

From main.tsx:27-44:

<StrictMode>
  <HelmetProvider>              {/* SEO/meta tags */}
    <PostHogProvider>           {/* Analytics */}
      <QueryClientProvider>     {/* Server state */}
        <CaptchaProvider>       {/* reCAPTCHA */}
          <DataProvider>        {/* Video data */}
            <ThemeProvider>     {/* Theme switching */}
              <BrowserRouter>   {/* Routing */}
                <App />
              </BrowserRouter>
            </ThemeProvider>
          </DataProvider>
        </CaptchaProvider>
      </QueryClientProvider>
    </PostHogProvider>
  </HelmetProvider>
</StrictMode>

2. Backend (server/)

The backend is a lightweight, fast API server built with Hono and Bun.

Tech Stack

Technology	Version	Purpose
Bun	Latest	JavaScript runtime
Hono	4.11.7	Web framework
Drizzle ORM	0.38.0	Type-safe ORM
PostgreSQL	16	Relational database
Pino	10.3.0	Structured logging
postgres	3.4.5	PostgreSQL client

Server Architecture

server/src/
├── index.ts              # Main server entry point
├── logger.ts             # Pino logger configuration
├── db/
│   ├── index.ts         # Database connection
│   ├── schema.ts        # Drizzle schema definitions
│   └── seed.ts          # Database seeding script
├── routes/
│   ├── users.ts         # User endpoints
│   ├── videos.ts        # Video endpoints
│   ├── comments.ts      # Comment endpoints
│   └── videoLikes.ts    # Video like endpoints
└── scripts/
    └── query.ts         # Database query utilities

Middleware Stack

From index.ts:14-23:

// 1. Logging middleware
app.use("*", honoLogger());

// 2. CORS middleware
app.use("*", cors());

// 3. Rate limiting (API routes only)
app.use("/api/*", rateLimiter({
  windowMs: 60 * 1000,     // 1 minute window
  limit: 50,               // 50 requests per window
  keyGenerator: (c) => c.req.header("x-forwarded-for") ?? "",
}));

API Routes

All routes are prefixed with /api and defined in index.ts:32-35:

Route	File	Endpoints
`/api/users/*`	`routes/users.ts`	User registration, profile
`/api/videos/*`	`routes/videos.ts`	Video list, metadata
`/api/comments/*`	`routes/comments.ts`	CRUD for comments
`/api/videoLikes/*`	`routes/videoLikes.ts`	Like/unlike videos

Database Layer

Drizzle ORM provides type-safe database access:

// Example: Fetch all videos sorted by views
const allVideos = await db
  .select()
  .from(videos)
  .orderBy(desc(videos.views));

Connection configuration in drizzle.config.ts:3-10:

export default defineConfig({
  schema: "./src/db/schema.ts",
  out: "./drizzle",
  dialect: "postgresql",
  dbCredentials: {
    url: process.env.DATABASE_URL || 
         "postgres://jtube:jtube@localhost:5432/jtube",
  },
});

Error Handling

Centralized error handler in index.ts:42-45:

app.onError((err, c) => {
  logger.error({ err }, "unhandled error");
  return c.json({ error: "Internal server error" }, 500);
});

3. Scraper (scrapper/)

The scraper is a Playwright-based automation tool that extracts video metadata from government websites.

Tech Stack

Technology	Version	Purpose
Playwright	1.58.1	Browser automation
Bun	Latest	Runtime and file I/O

How It Works

Target Selection

The scraper targets three DOJ disclosure datasets:From scraper.ts:4-8:

const DATASETS = [
  { id: 9, baseUrl: "...", estimatedPages: 10002 },
  { id: 10, baseUrl: "...", estimatedPages: 10000 },
  { id: 11, baseUrl: "...", estimatedPages: 1000 },
];

Concurrent Scraping

Uses configurable concurrency (default 5 pages in parallel) to efficiently scrape thousands of pages.Features:

Automatic robot check handling
Progress tracking with resume capability
Failed page retry mechanism
Staggered request timing to avoid rate limits

Data Extraction

Extracts MP4 file links from each page:

interface Mp4File {
  filename: string;
  url: string;
  sourcePageUrl?: string;
}

Output

Saves results to JSON files that can be imported by the seeding script.

Command Line Interface

The scraper supports extensive CLI options (from scraper.ts:44-88):

Flag	Description
`--dataset <id>`	Scrape specific dataset (9, 10, or 11)
`--visible`	Run browser in visible mode
`--max-pages <n>`	Limit number of pages to scrape
`--concurrency <n>`	Number of parallel pages (default 5)
`--retry-failed`	Retry previously failed pages
`--clear-progress`	Start fresh, clearing progress
`--sequential`	Disable parallel scraping
`--debug`	Enable debug logging

Data Flow

1. Data Ingestion Flow

2. API Request Flow

3. Comment Submission Flow

Performance Optimizations

Frontend

Code Splitting: Automatic route-based splitting via React Router
Query Caching: 5-minute stale time for TanStack Query
Image Optimization: Lazy loading for video thumbnails
CSS: Utility-first TailwindCSS with minimal runtime

Backend

Bun Runtime: ~3x faster than Node.js for I/O operations
Connection Pooling: PostgreSQL connection reuse
Rate Limiting: Prevents API abuse (50 req/min per IP)
Indexed Queries: Database indexes on frequently queried fields

Database

Composite Indexes: Optimized multi-column queries
Foreign Key Cascades: Automatic cleanup of related data
Unique Constraints: Prevent duplicate entries at DB level

Security

Rate Limiting

50 requests per minute per IP address on all API endpoints

CORS

Configured to allow cross-origin requests from the frontend

reCAPTCHA

Google reCAPTCHA v3 on comment submissions to prevent spam

SQL Injection Prevention

Drizzle ORM provides parameterized queries

IP Hashing

User IP addresses are hashed (not stored in plain text)

Cascading Deletes

Foreign key cascades ensure data integrity

Deployment Architecture

This section describes a typical production deployment. Adjust based on your infrastructure.

Frontend: Static files served via CDN (Vercel, Cloudflare Pages, etc.)
Backend: Bun server on a VPS or container platform
Database: Managed PostgreSQL (AWS RDS, Supabase, Railway, etc.)
Storage: Object storage for video files (S3, R2, etc.)

Next Steps

Setup Guide

Get your development environment running

Database Schema

Explore the database structure in detail

Get Started

Core Features

Development

​System Overview

​Component Architecture

​1. Frontend (web/)

​Tech Stack

​Application Structure

​Routing Structure

​State Management

​Provider Hierarchy

​2. Backend (server/)

​Tech Stack

​Server Architecture

​Middleware Stack

​API Routes

​Database Layer

​Error Handling

​3. Scraper (scrapper/)

​Tech Stack

​How It Works

​Command Line Interface

​Data Flow

​1. Data Ingestion Flow

​2. API Request Flow

​3. Comment Submission Flow

​Performance Optimizations

​Frontend

​Backend

​Database

​Security

Rate Limiting

CORS

reCAPTCHA

SQL Injection Prevention

IP Hashing

Cascading Deletes

​Deployment Architecture

​Next Steps

Setup Guide

Database Schema

Build docs developers (and LLMs) love

System Overview

Component Architecture

1. Frontend (web/)

Tech Stack

Application Structure

Routing Structure

State Management

Provider Hierarchy

2. Backend (server/)

Tech Stack

Server Architecture

Middleware Stack

API Routes

Database Layer

Error Handling

3. Scraper (scrapper/)

Tech Stack

How It Works

Command Line Interface

Data Flow

1. Data Ingestion Flow

2. API Request Flow

3. Comment Submission Flow

Performance Optimizations

Frontend

Backend

Database

Security

Deployment Architecture

Next Steps