Skip to main content

System Overview

Jefftube is a full-stack web application built as a monorepo with three independent components that work together to create a video streaming platform.

Component Architecture

1. Frontend (web/)

The frontend is a modern React application built with performance and user experience in mind.

Tech Stack

TechnologyVersionPurpose
React19.2.0UI framework
React Router7.13.0Client-side routing
Vite7.2.4Build tool and dev server
TailwindCSS4.1.18Utility-first styling
TanStack Query5.90.20Server state management
PostHog1.336.4Analytics and feature flags
TypeScript5.9.3Type safety

Application Structure

web/src/
├── App.tsx                 # Root component with routing
├── main.tsx               # Application entry point
├── components/
│   ├── layout/           # Header, Sidebar
│   ├── video/            # Video player and related components
│   ├── comments/         # Comment section and components
│   ├── ui/              # Reusable UI components (Avatar, Dialog)
│   └── icons/           # Icon components
├── pages/
│   ├── channel/         # Main channel page
│   ├── shorts/          # Shorts feed page
│   └── playlist/        # Playlist viewer page
├── hooks/               # Custom React hooks
├── contexts/            # React Context providers
│   ├── ThemeContext    # Dark/light mode
│   ├── DataContext     # Video data management
│   └── CaptchaContext  # reCAPTCHA integration
└── utils/              # Utility functions

Routing Structure

Defined in App.tsx:58-64:
<Routes>
  <Route path="/" element={<ChannelPage />} />
  <Route path="/watch/:videoId" element={<VideoPage />} />
  <Route path="/shorts" element={<ShortsPage />} />
  <Route path="/playlist/:playlistId/:videoId" element={<PlaylistVideoPage />} />
  <Route path="*" element={<NotFoundPage />} />
</Routes>

State Management

The frontend uses a hybrid state management approach:
TanStack Query manages all server-side data:
  • Video metadata fetching
  • Comments and replies
  • User interactions (likes, comments)
  • Automatic caching and refetching
Configuration in main.tsx:13-20:
const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      staleTime: 1000 * 60 * 5, // 5 minutes
      refetchOnWindowFocus: false,
    },
  },
});

Provider Hierarchy

From main.tsx:27-44:
<StrictMode>
  <HelmetProvider>              {/* SEO/meta tags */}
    <PostHogProvider>           {/* Analytics */}
      <QueryClientProvider>     {/* Server state */}
        <CaptchaProvider>       {/* reCAPTCHA */}
          <DataProvider>        {/* Video data */}
            <ThemeProvider>     {/* Theme switching */}
              <BrowserRouter>   {/* Routing */}
                <App />
              </BrowserRouter>
            </ThemeProvider>
          </DataProvider>
        </CaptchaProvider>
      </QueryClientProvider>
    </PostHogProvider>
  </HelmetProvider>
</StrictMode>

2. Backend (server/)

The backend is a lightweight, fast API server built with Hono and Bun.

Tech Stack

TechnologyVersionPurpose
BunLatestJavaScript runtime
Hono4.11.7Web framework
Drizzle ORM0.38.0Type-safe ORM
PostgreSQL16Relational database
Pino10.3.0Structured logging
postgres3.4.5PostgreSQL client

Server Architecture

server/src/
├── index.ts              # Main server entry point
├── logger.ts             # Pino logger configuration
├── db/
│   ├── index.ts         # Database connection
│   ├── schema.ts        # Drizzle schema definitions
│   └── seed.ts          # Database seeding script
├── routes/
│   ├── users.ts         # User endpoints
│   ├── videos.ts        # Video endpoints
│   ├── comments.ts      # Comment endpoints
│   └── videoLikes.ts    # Video like endpoints
└── scripts/
    └── query.ts         # Database query utilities

Middleware Stack

From index.ts:14-23:
// 1. Logging middleware
app.use("*", honoLogger());

// 2. CORS middleware
app.use("*", cors());

// 3. Rate limiting (API routes only)
app.use("/api/*", rateLimiter({
  windowMs: 60 * 1000,     // 1 minute window
  limit: 50,               // 50 requests per window
  keyGenerator: (c) => c.req.header("x-forwarded-for") ?? "",
}));

API Routes

All routes are prefixed with /api and defined in index.ts:32-35:
RouteFileEndpoints
/api/users/*routes/users.tsUser registration, profile
/api/videos/*routes/videos.tsVideo list, metadata
/api/comments/*routes/comments.tsCRUD for comments
/api/videoLikes/*routes/videoLikes.tsLike/unlike videos

Database Layer

Drizzle ORM provides type-safe database access:
// Example: Fetch all videos sorted by views
const allVideos = await db
  .select()
  .from(videos)
  .orderBy(desc(videos.views));
Connection configuration in drizzle.config.ts:3-10:
export default defineConfig({
  schema: "./src/db/schema.ts",
  out: "./drizzle",
  dialect: "postgresql",
  dbCredentials: {
    url: process.env.DATABASE_URL || 
         "postgres://jtube:jtube@localhost:5432/jtube",
  },
});

Error Handling

Centralized error handler in index.ts:42-45:
app.onError((err, c) => {
  logger.error({ err }, "unhandled error");
  return c.json({ error: "Internal server error" }, 500);
});

3. Scraper (scrapper/)

The scraper is a Playwright-based automation tool that extracts video metadata from government websites.

Tech Stack

TechnologyVersionPurpose
Playwright1.58.1Browser automation
BunLatestRuntime and file I/O

How It Works

1

Target Selection

The scraper targets three DOJ disclosure datasets:From scraper.ts:4-8:
const DATASETS = [
  { id: 9, baseUrl: "...", estimatedPages: 10002 },
  { id: 10, baseUrl: "...", estimatedPages: 10000 },
  { id: 11, baseUrl: "...", estimatedPages: 1000 },
];
2

Concurrent Scraping

Uses configurable concurrency (default 5 pages in parallel) to efficiently scrape thousands of pages.Features:
  • Automatic robot check handling
  • Progress tracking with resume capability
  • Failed page retry mechanism
  • Staggered request timing to avoid rate limits
3

Data Extraction

Extracts MP4 file links from each page:
interface Mp4File {
  filename: string;
  url: string;
  sourcePageUrl?: string;
}
4

Output

Saves results to JSON files that can be imported by the seeding script.

Command Line Interface

The scraper supports extensive CLI options (from scraper.ts:44-88):
FlagDescription
--dataset <id>Scrape specific dataset (9, 10, or 11)
--visibleRun browser in visible mode
--max-pages <n>Limit number of pages to scrape
--concurrency <n>Number of parallel pages (default 5)
--retry-failedRetry previously failed pages
--clear-progressStart fresh, clearing progress
--sequentialDisable parallel scraping
--debugEnable debug logging

Data Flow

1. Data Ingestion Flow

2. API Request Flow

3. Comment Submission Flow

Performance Optimizations

Frontend

  • Code Splitting: Automatic route-based splitting via React Router
  • Query Caching: 5-minute stale time for TanStack Query
  • Image Optimization: Lazy loading for video thumbnails
  • CSS: Utility-first TailwindCSS with minimal runtime

Backend

  • Bun Runtime: ~3x faster than Node.js for I/O operations
  • Connection Pooling: PostgreSQL connection reuse
  • Rate Limiting: Prevents API abuse (50 req/min per IP)
  • Indexed Queries: Database indexes on frequently queried fields

Database

  • Composite Indexes: Optimized multi-column queries
  • Foreign Key Cascades: Automatic cleanup of related data
  • Unique Constraints: Prevent duplicate entries at DB level

Security

Rate Limiting

50 requests per minute per IP address on all API endpoints

CORS

Configured to allow cross-origin requests from the frontend

reCAPTCHA

Google reCAPTCHA v3 on comment submissions to prevent spam

SQL Injection Prevention

Drizzle ORM provides parameterized queries

IP Hashing

User IP addresses are hashed (not stored in plain text)

Cascading Deletes

Foreign key cascades ensure data integrity

Deployment Architecture

This section describes a typical production deployment. Adjust based on your infrastructure.
  • Frontend: Static files served via CDN (Vercel, Cloudflare Pages, etc.)
  • Backend: Bun server on a VPS or container platform
  • Database: Managed PostgreSQL (AWS RDS, Supabase, Railway, etc.)
  • Storage: Object storage for video files (S3, R2, etc.)

Next Steps

Setup Guide

Get your development environment running

Database Schema

Explore the database structure in detail

Build docs developers (and LLMs) love