Overview
Mizen is a Next.js application that parses recipes from URLs or images using a hybrid approach combining structured data extraction and AI processing. The architecture prioritizes performance through a multi-layer parsing strategy that minimizes AI token usage while maintaining accuracy.Core Architecture Pattern
Three-Layer Parsing System
Mizen uses a sophisticated three-layer approach to recipe extraction:src/utils/aiRecipeParser.ts:292)
- Extracts from
<script type="application/ld+json">tags - Returns raw ingredient strings (no amount/unit split)
- All ingredients in a single “Main” group
- No cuisine, storage, or plating metadata
- Limitations: Author filtering heuristic can be overly aggressive
src/utils/aiRecipeParser.ts:668)
- Uses Groq AI API with llama-3.3-70b-versatile model
- Sends cleaned HTML (max 15k chars) to AI
- Returns structured ingredients with amount/unit/name splits
- Generates logical ingredient groupings
- Adds AI metadata: descriptions, substitutions, cuisine, storage, plating
- Constraints:
- 15k char HTML truncation can cut off recipes on blog-heavy pages
- 4k token output limit can truncate complex recipes mid-JSON
- No deduplication if recipe appears twice in HTML
- Uses JSON-LD for title/author/servings/times (ground truth)
- Calls AI to enrich with groupings, cuisine, and metadata
- Best of both worlds: accurate facts + rich enhancements
Entry Points
API Architecture
Next.js App Router Structure
Mizen uses Next.js 15+ App Router with API routes in/src/app/api/:
Unified Parsing API
The/api/parseRecipe endpoint (src/app/api/parseRecipe/route.ts) is the primary interface:
Request:
Data Flow Architecture
Recipe Parsing Flow
HTML Cleaning Strategy
Priority Order (src/utils/htmlCleaner.ts:29):
- Extract JSON-LD first — Before any cleaning, preserve structured data
- Identify recipe sections — Find ingredients and instructions by:
- Class/ID selectors (
[class*="ingredient"],[id*="instruction"]) - Schema.org attributes (
[itemprop="recipeIngredient"]) - Heading-based detection (“Ingredients”, “Instructions” headers)
- Class/ID selectors (
- Remove non-essential elements — Scripts, styles, navigation, ads, comments, social widgets
- Build optimized HTML — Structured format emphasizing recipe content:
Client-Side Architecture
Component Structure
State Management
Local Storage Strategy (src/lib/storage.ts):
- Recent Recipes (
recentRecipeskey) — Last 10 parsed recipes, auto-sorted by access time - Bookmarked Recipes (
bookmarkedRecipeskey) — User-saved recipes (unlimited) - Recipe Order (
recipeOrderkey) — Custom ordering for drag-and-drop - Error Logs (
parse-n-plate-error-logskey) — Last 50 errors for debugging
- Legacy string instructions converted to
{title, detail}objects on read - Bookmark migration from ID-only to full recipe objects
- Consistent sorting by
lastAccessedAtorparsedAt
React Patterns
Functional Components Only — No class components (enforced by AGENTS.md) Hooks Usage:- Custom hooks for error handling (
useRecipeErrorHandler) - Standard React hooks (useState, useEffect, etc.)
- Used for async and error handling in UI
- Graceful degradation on failures
Design Patterns
Error Handling Pattern
Consistent error response structure across all API endpoints:Repository Pattern
LocalStorage acts as a repository with clear CRUD operations:Normalizer Pattern
Data normalization functions ensure consistency:Performance Optimizations
Token Usage Minimization
- JSON-LD First — Avoid AI calls when structured data exists
- HTML Truncation — Limit to 15k chars before sending to AI
- Smart Cleaning — Remove ads/nav/comments before AI processing
- Output Limits — Cap AI response to 4k tokens
Caching Strategy
- localStorage caching — Parsed recipes stored client-side
- No duplicate parsing — Same URL detection prevents re-parsing
- Recent recipes cap — Max 10 recent recipes to prevent bloat
Client-Side Optimization
- React 19 — Latest performance improvements
- CSS Modules + Tailwind — Optimized styling
- Tree shaking — Unused code elimination via Next.js
- Lazy loading — Components loaded on demand
Known Limitations
Accuracy Issues
FromaiRecipeParser.ts comments:
- Author-name heuristic (line 381) — False positives on short instructions like “Season well”
- AI format consistency — Sometimes returns string instructions instead of
{title, detail}objects - HTML truncation — 15k char limit can cut off recipes on blog-heavy pages
- Output truncation — 4k token limit can truncate complex recipes mid-JSON
- No deduplication — Recipe appearing twice in HTML (jump-to-recipe + inline) not handled
- Instruction length filter — 10-char minimum silently drops short valid steps
Cuisine Detection Issues
From AI prompt (line 1164):- Pad Thai incorrectly mapped to [“Chinese”] instead of [“Thai”]
- Thai cuisine not in SUPPORTED_CUISINES list
- Prompt teaches wrong associations to the model
Security Considerations
API Key Management
- Environment variables only —
GROQ_API_KEYnever exposed to client - Server-side API calls — All AI requests from Next.js API routes
Input Validation
XSS Prevention
- Cheerio HTML parsing — Safe DOM manipulation
- React auto-escaping — XSS protection by default
- No dangerouslySetInnerHTML — Avoid direct HTML injection
Testing Strategy
While test files aren’t in the source, the architecture supports:- Unit tests — Pure functions (formatError, normalizers)
- Integration tests — API route handlers
- E2E tests — Full parsing flow from URL to display
Deployment Architecture
Designed for Vercel deployment:- Next.js serverless functions for API routes
- Edge runtime support (if enabled)
- Environment variables via Vercel dashboard
- Analytics via
@vercel/analyticsand@vercel/speed-insights
Future Architecture Considerations
From TECHNICAL_SUMMARY.md:- Sentry integration — Real-time error tracking in production
- Automatic retries — Retry failed operations automatically
- Fallback strategies — Multiple parsing strategies for robustness
- User preferences — Remember user’s preferred error handling
- Accessibility improvements — Screen reader support, keyboard navigation