System Architecture
i18n Doctor is a web-based localization health scanner that analyzes public GitHub repositories for translation issues and provides automated fixes via Lingo.dev integration.Core Architecture
The application follows a modern full-stack architecture:Frontend
Next.js 15 App Router with React 19, providing server-side rendering and client-side interactivity
Backend
Next.js API routes handling GitHub integration, scanning logic, and database operations
Database
PostgreSQL via Drizzle ORM for storing scan reports, user data, and leaderboard metrics
External APIs
GitHub REST API for repository access and Lingo.dev SDK for automated translations
Core Components
The system is organized into several key components that work together to deliver the scanning and fixing functionality:1. GitHub Integration Layer
Handles all interactions with the GitHub API:- Supports both authenticated and unauthenticated requests
- Optional
GITHUB_TOKENfor higher rate limits (60 → 5,000 req/h) - Handles public repositories without authentication
- Error handling for 404s, rate limits, and API failures
2. Locale Detection Engine
- Scanning common i18n directory patterns (
locales/,i18n/,public/locales/,lang/, etc.) - Supporting multiple file formats (JSON, YAML,
.po,.properties) - Inferring locale codes from file names or directory structure
- Identifying the source locale (typically
enoren-US)
3. Locale Parser
- Parses JSON, YAML, and gettext
.pofiles - Flattens nested translation keys into dot-notation paths
- Normalizes different translation file structures
- Returns a
KeyMap(flat key-value pairs) for comparison
4. Diff Engine
- Missing Keys: Keys present in source but absent in target
- Untranslated Keys: Keys that exist but have empty values
- Orphan Keys: Keys in target but not in source (unused/leftover)
- Coverage Calculation: Percentage of properly translated keys
5. Authentication System
Implemented with better-auth and GitHub OAuth:- Users authenticate via GitHub to access premium features
- Enables saving scan history and accessing the dashboard
- Required for future “Open PR” functionality
- Session management and user profile storage
6. Database Layer
Using Drizzle ORM with PostgreSQL:reportstable: Stores scan results with RLS (Row-Level Security) policies- User profiles and authentication data
- Leaderboard metrics for benchmarking i18n tooling projects
Data Flow: Scan to Report
Repository Input
User provides a GitHub repository URL (e.g.,
https://github.com/owner/repo) or short form (owner/repo)Repository Validation
System fetches repo metadata via GitHub API to confirm it exists and is accessible
Locale Detection
Scan the file tree for translation files in common patterns:
locales/**/*.jsoni18n/**/*.yamlpublic/locales/**/*- Custom patterns based on popular i18n libraries
Report Generation
Generate comprehensive health report with:
- Per-locale coverage percentages
- Lists of missing, untranslated, and orphan keys
- Overall summary statistics
- Visual progress bars and charts
API Routes
The application exposes several API endpoints:Scan Endpoint Flow
GitHub API Integration
The system uses the GitHub REST API v3:API Endpoints Used
API Endpoints Used
GET /repos/:owner/:repo- Repository metadataGET /repos/:owner/:repo/git/trees/:branch?recursive=1- Full file treehttps://raw.githubusercontent.com/:owner/:repo/:branch/:path- Raw file content
Rate Limits: Without authentication: 60 requests/hour. With
GITHUB_TOKEN: 5,000 requests/hour. The app works without a token for public repos but benefits significantly from authenticated requests.Lingo.dev Integration
The Lingo.dev integration provides automated translation capabilities:SDK Integration
Runtime translation of missing keys on-demand
CLI Usage
Server-side batch processing of locale files
Compiler
The app itself uses Lingo.dev for multilingual support (dogfooding)
CI/CD
GitHub Action for automatic translation on push
One-Click Fix Flow (In Progress)
- User clicks “Fix with Lingo.dev” button on report page
- Backend sends missing/untranslated keys to Lingo.dev API
- Lingo.dev returns translated content for all target locales
- System generates before/after diff preview
- User can download fixed files as ZIP or open a GitHub PR
Security Considerations
- Public Repos Only: The scanner only works with public GitHub repositories
- No Code Execution: Files are parsed statically, never executed
- Rate Limiting: GitHub API rate limits prevent abuse
- Row-Level Security: Supabase RLS policies ensure users only access their own data
- Environment Variables: Sensitive tokens stored in
.env.localfiles
Performance Optimizations
- Turbopack: Fast development builds and hot module replacement
- Parallel Processing: Concurrent file downloads and parsing where possible
- Caching: GitHub API responses can be cached for frequently scanned repos
- Incremental Parsing: Only parse detected locale files, skip unrelated files
Error Handling
The system gracefully handles common failure scenarios:- Repository not found (404)
- GitHub API rate limit exceeded (403)
- Malformed locale files (parse errors)
- Large repositories (truncated trees > 100k entries)
- Network timeouts and connection failures
Deployment Architecture
- Hosting: Vercel (serverless Next.js deployment)
- Database: Supabase (managed PostgreSQL)
- CDN: Vercel Edge Network for global distribution
- Analytics: Vercel Analytics for monitoring
Next Steps
Monorepo Structure
Learn about the pnpm workspaces + Turborepo setup
Tech Stack
Explore the complete technology stack in detail