System Architecture

Overview

Mizen is a Next.js application that parses recipes from URLs or images using a hybrid approach combining structured data extraction and AI processing. The architecture prioritizes performance through a multi-layer parsing strategy that minimizes AI token usage while maintaining accuracy.

Core Architecture Pattern

Three-Layer Parsing System

Mizen uses a sophisticated three-layer approach to recipe extraction:

┌─────────────────────────────────────────┐
│         Layer 1: JSON-LD                │
│  Fast, free, reliable when available    │
│  Extracts structured recipe data        │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│     Layer 2: AI Parsing (Groq)          │
│  Slower, uses tokens, handles any page  │
│  Structured ingredients + enrichment    │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│   Layer 3: Hybrid (JSON-LD + AI)        │
│  Uses JSON-LD for facts, AI for         │
│  groupings, cuisine, and metadata       │
└─────────────────────────────────────────┘

Layer 1 — JSON-LD Extraction (src/utils/aiRecipeParser.ts:292)

Extracts from <script type="application/ld+json"> tags
Returns raw ingredient strings (no amount/unit split)
All ingredients in a single “Main” group
No cuisine, storage, or plating metadata
Limitations: Author filtering heuristic can be overly aggressive

Layer 2 — AI Parsing (src/utils/aiRecipeParser.ts:668)

Uses Groq AI API with llama-3.3-70b-versatile model
Sends cleaned HTML (max 15k chars) to AI
Returns structured ingredients with amount/unit/name splits
Generates logical ingredient groupings
Adds AI metadata: descriptions, substitutions, cuisine, storage, plating
Constraints:
- 15k char HTML truncation can cut off recipes on blog-heavy pages
- 4k token output limit can truncate complex recipes mid-JSON
- No deduplication if recipe appears twice in HTML

Layer 3 — Hybrid Approach (Most Common)

Uses JSON-LD for title/author/servings/times (ground truth)
Calls AI to enrich with groupings, cuisine, and metadata
Best of both worlds: accurate facts + rich enhancements

Entry Points

// src/utils/aiRecipeParser.ts

// Parse from raw HTML string
parseRecipe(rawHtml: string): Promise<ParserResult>

// Fetch URL then parse
parseRecipeFromUrl(url: string): Promise<ParserResult>

// Vision model extraction from image
parseRecipeFromImage(base64Image: string): Promise<ParserResult>

API Architecture

Next.js App Router Structure

Mizen uses Next.js 15+ App Router with API routes in /src/app/api/:

src/app/api/
├── parseRecipe/          # Unified recipe parsing endpoint (primary)
├── parseRecipeFromImage/ # Image-based recipe extraction
├── fetchHtml/            # URL fetching with validation
├── urlValidator/         # URL validation and checking
├── parseIngredients/     # Legacy: AI ingredient parsing
├── parseInstructions/    # Legacy: AI instruction parsing
├── generateSubstitutions/# Generate ingredient substitutions
├── generatePlatingGuidance/# Generate plating suggestions
├── extractImages/        # Extract images from recipe pages
├── feedback/             # User feedback submission
└── admin/
    └── debug-parse/      # Admin debugging tools

Unified Parsing API

The /api/parseRecipe endpoint (src/app/api/parseRecipe/route.ts) is the primary interface: Request:

{
  "url": "https://example.com/recipe"
}

Success Response:

{
  "success": true,
  "title": "Recipe Title",
  "ingredients": [
    {
      "groupName": "Main",
      "ingredients": [
        {
          "amount": "1",
          "units": "cup",
          "ingredient": "flour",
          "description": "Provides structure",
          "substitutions": ["almond flour", "oat flour"]
        }
      ]
    }
  ],
  "instructions": [
    {
      "title": "Mix ingredients",
      "detail": "In a large bowl, mix...",
      "timeMinutes": 5
    }
  ],
  "method": "json-ld" | "ai" | "json-ld+ai",
  "author": "Chef Name",
  "servings": 4,
  "cuisine": ["Italian", "Mediterranean"],
  "storageGuide": "Store in airtight container...",
  "shelfLife": {"fridge": 3, "freezer": 30}
}

Error Response:

{
  "success": false,
  "error": {
    "code": "ERR_NO_RECIPE_FOUND",
    "message": "No recipe found on this page",
    "retryAfter": 1234567890 // Optional: for rate limits
  }
}

Data Flow Architecture

Recipe Parsing Flow

1. User Input (URL or Image)
   ↓
2. URL Validation (/api/urlValidator)
   ↓
3. HTML Fetching (/api/fetchHtml)
   ↓
4. HTML Cleaning (htmlCleaner.ts)
   ├── Extract JSON-LD scripts first
   ├── Remove ads, nav, scripts, styles
   ├── Prioritize ingredient/instruction sections
   └── Return optimized HTML (max 15k chars)
   ↓
5. Recipe Parsing (aiRecipeParser.ts)
   ├── Try JSON-LD extraction (Layer 1)
   │   ├── Success → Return immediately
   │   └── Partial/Fail → Continue to AI
   ├── AI Parsing with Groq (Layer 2)
   │   └── Send cleaned HTML to llama-3.3-70b-versatile
   └── Return parsed recipe
   ↓
6. Store in localStorage (storage.ts)
   ├── Recent recipes (max 10)
   └── Bookmarked recipes (unlimited)
   ↓
7. Display to User

HTML Cleaning Strategy

Priority Order (src/utils/htmlCleaner.ts:29):

Extract JSON-LD first — Before any cleaning, preserve structured data
Identify recipe sections — Find ingredients and instructions by:
- Class/ID selectors ([class*="ingredient"], [id*="instruction"])
- Schema.org attributes ([itemprop="recipeIngredient"])
- Heading-based detection (“Ingredients”, “Instructions” headers)
Remove non-essential elements — Scripts, styles, navigation, ads, comments, social widgets

Build optimized HTML — Structured format emphasizing recipe content:

<script type="application/ld+json">...</script>
<h1 class="recipe-title-parsed">Title</h1>
<section class="recipe-ingredients-parsed">
  <h2>INGREDIENTS</h2>
  ...
</section>
<section class="recipe-instructions-parsed">
  <h2>INSTRUCTIONS</h2>
  ...
</section>

Client-Side Architecture

Component Structure

src/
├── app/
│   ├── page.tsx                 # Homepage with search
│   ├── parsed-recipe-page/      # Recipe detail view
│   ├── profile/                 # User profile
│   └── layout.tsx               # Root layout
├── components/
│   ├── layout/                  # AppShell, Navbar, Sidebar
│   ├── search/                  # Search form and input
│   ├── ingredients/             # Ingredient cards and groups
│   ├── recipe/                  # Recipe display components
│   │   └── CookMode/            # Step-by-step cooking mode
│   ├── ui/                      # Radix UI + shadcn components
│   └── homepage/                # Landing page components
├── hooks/
│   └── useRecipeErrorHandler.ts # Error message handling
├── contexts/                    # React contexts (if any)
└── lib/
    ├── storage.ts               # localStorage recipe management
    └── utils.tsx                # Utility functions

State Management

Local Storage Strategy (src/lib/storage.ts):

Recent Recipes (recentRecipes key) — Last 10 parsed recipes, auto-sorted by access time
Bookmarked Recipes (bookmarkedRecipes key) — User-saved recipes (unlimited)
Recipe Order (recipeOrder key) — Custom ordering for drag-and-drop
Error Logs (parse-n-plate-error-logs key) — Last 50 errors for debugging

Key operations:

// src/lib/storage.ts

getRecentRecipes(): ParsedRecipe[]           // Fetch recent recipes
addRecentRecipe(recipe): void                // Add new recipe (auto-dedup by URL)
getBookmarkedRecipes(): ParsedRecipe[]       // Fetch bookmarks
addBookmark(id): void                        // Bookmark a recipe
removeBookmark(id): void                     // Remove bookmark
pinRecipe(id): void                          // Pin to top
touchRecipeAccess(id): void                  // Update lastAccessedAt

Data Normalization:

Legacy string instructions converted to {title, detail} objects on read
Bookmark migration from ID-only to full recipe objects
Consistent sorting by lastAccessedAt or parsedAt

React Patterns

Functional Components Only — No class components (enforced by AGENTS.md) Hooks Usage:

Custom hooks for error handling (useRecipeErrorHandler)
Standard React hooks (useState, useEffect, etc.)

Error Boundaries:

Used for async and error handling in UI
Graceful degradation on failures

Design Patterns

Error Handling Pattern

Consistent error response structure across all API endpoints:

// src/utils/formatError.ts

export interface ErrorResponse {
  success: false;
  error: {
    code: string;          // ERR_INVALID_URL, ERR_NO_RECIPE_FOUND, etc.
    message: string;       // User-friendly message
    retryAfter?: number;   // Timestamp for rate limits
  };
}

export const ERROR_CODES = {
  ERR_INVALID_URL: 'ERR_INVALID_URL',
  ERR_UNSUPPORTED_DOMAIN: 'ERR_UNSUPPORTED_DOMAIN',
  ERR_FETCH_FAILED: 'ERR_FETCH_FAILED',
  ERR_NO_RECIPE_FOUND: 'ERR_NO_RECIPE_FOUND',
  ERR_AI_PARSE_FAILED: 'ERR_AI_PARSE_FAILED',
  ERR_TIMEOUT: 'ERR_TIMEOUT',
  ERR_RATE_LIMIT: 'ERR_RATE_LIMIT',
  // ... more codes
};

See Error Handling for details.

Repository Pattern

LocalStorage acts as a repository with clear CRUD operations:

// Create
addRecentRecipe(recipe) → Adds with generated ID

// Read
getRecentRecipes() → Returns all recent recipes
getRecipeById(id) → Returns single recipe

// Update
updateRecipe(id, updates) → Merges partial updates

// Delete
removeRecentRecipe(id) → Removes from storage
restoreRecentRecipe(recipe) → Undo deletion

Normalizer Pattern

Data normalization functions ensure consistency:

// Instruction normalization (storage.ts:117)
normalizeInstructions(instructions?: Array<string | InstructionStep>)
  → InstructionStep[]

// Cuisine normalization (aiRecipeParser.ts:254)
normalizeCuisineField(raw: unknown) → string[] | undefined

// HTML entity decoding (aiRecipeParser.ts:219)
decodeHtmlEntities(text: string) → string

Performance Optimizations

Token Usage Minimization

JSON-LD First — Avoid AI calls when structured data exists
HTML Truncation — Limit to 15k chars before sending to AI
Smart Cleaning — Remove ads/nav/comments before AI processing
Output Limits — Cap AI response to 4k tokens

Caching Strategy

localStorage caching — Parsed recipes stored client-side
No duplicate parsing — Same URL detection prevents re-parsing
Recent recipes cap — Max 10 recent recipes to prevent bloat

Client-Side Optimization

React 19 — Latest performance improvements
CSS Modules + Tailwind — Optimized styling
Tree shaking — Unused code elimination via Next.js
Lazy loading — Components loaded on demand

Known Limitations

Accuracy Issues

From aiRecipeParser.ts comments:

Author-name heuristic (line 381) — False positives on short instructions like “Season well”
AI format consistency — Sometimes returns string instructions instead of {title, detail} objects
HTML truncation — 15k char limit can cut off recipes on blog-heavy pages
Output truncation — 4k token limit can truncate complex recipes mid-JSON
No deduplication — Recipe appearing twice in HTML (jump-to-recipe + inline) not handled
Instruction length filter — 10-char minimum silently drops short valid steps

Cuisine Detection Issues

From AI prompt (line 1164):

Pad Thai incorrectly mapped to [“Chinese”] instead of [“Thai”]
Thai cuisine not in SUPPORTED_CUISINES list
Prompt teaches wrong associations to the model

Security Considerations

API Key Management

Environment variables only — GROQ_API_KEY never exposed to client
Server-side API calls — All AI requests from Next.js API routes

Input Validation

// URL validation before fetching
try {
  new URL(url); // Throws if invalid
} catch {
  return formatError(ERROR_CODES.ERR_INVALID_URL, 'Invalid URL format');
}

XSS Prevention

Cheerio HTML parsing — Safe DOM manipulation
React auto-escaping — XSS protection by default
No dangerouslySetInnerHTML — Avoid direct HTML injection

Testing Strategy

While test files aren’t in the source, the architecture supports:

Unit tests — Pure functions (formatError, normalizers)
Integration tests — API route handlers
E2E tests — Full parsing flow from URL to display

Deployment Architecture

Designed for Vercel deployment:

Next.js serverless functions for API routes
Edge runtime support (if enabled)
Environment variables via Vercel dashboard
Analytics via @vercel/analytics and @vercel/speed-insights

Future Architecture Considerations

From TECHNICAL_SUMMARY.md:

Sentry integration — Real-time error tracking in production
Automatic retries — Retry failed operations automatically
Fallback strategies — Multiple parsing strategies for robustness
User preferences — Remember user’s preferred error handling
Accessibility improvements — Screen reader support, keyboard navigation

Setup

Architecture

Contributing

System Architecture

Overview

Core Architecture Pattern

Three-Layer Parsing System

Entry Points

API Architecture

Next.js App Router Structure

Unified Parsing API

Data Flow Architecture

Recipe Parsing Flow

HTML Cleaning Strategy

Client-Side Architecture

Component Structure

State Management

React Patterns

Design Patterns

Error Handling Pattern

Repository Pattern

Normalizer Pattern

Performance Optimizations

Token Usage Minimization

Caching Strategy

Client-Side Optimization

Known Limitations

Accuracy Issues

Cuisine Detection Issues

Security Considerations

API Key Management

Input Validation

XSS Prevention

Testing Strategy

Deployment Architecture

Future Architecture Considerations

Build docs developers (and LLMs) love

Setup

Architecture

Contributing

​Overview

​Core Architecture Pattern

​Three-Layer Parsing System

​Entry Points

​API Architecture

​Next.js App Router Structure

​Unified Parsing API

​Data Flow Architecture

​Recipe Parsing Flow

​HTML Cleaning Strategy

​Client-Side Architecture

​Component Structure

​State Management

​React Patterns

​Design Patterns

​Error Handling Pattern

​Repository Pattern

​Normalizer Pattern

​Performance Optimizations

​Token Usage Minimization

​Caching Strategy

​Client-Side Optimization

​Known Limitations

​Accuracy Issues

​Cuisine Detection Issues

​Security Considerations

​API Key Management

​Input Validation

​XSS Prevention

​Testing Strategy

​Deployment Architecture

​Future Architecture Considerations

Build docs developers (and LLMs) love

Overview

Core Architecture Pattern

Three-Layer Parsing System

Entry Points

API Architecture

Next.js App Router Structure

Unified Parsing API

Data Flow Architecture

Recipe Parsing Flow

HTML Cleaning Strategy

Client-Side Architecture

Component Structure

State Management

React Patterns

Design Patterns

Error Handling Pattern

Repository Pattern

Normalizer Pattern

Performance Optimizations

Token Usage Minimization

Caching Strategy

Client-Side Optimization

Known Limitations

Accuracy Issues

Cuisine Detection Issues

Security Considerations

API Key Management

Input Validation

XSS Prevention

Testing Strategy

Deployment Architecture

Future Architecture Considerations