Skip to main content

Overview

The locale-parser module extracts translation keys and values from locale files in multiple formats (JSON, YAML, PO) and converts them into flat key-value maps for easy comparison. Location: apps/www/lib/locale-parser.ts

Features

  • Parse JSON, YAML, and PO (gettext) files
  • Flatten nested JSON structures with dot notation
  • Handle multi-line values and escape sequences
  • Preserve empty values (untranslated strings)
  • Support for arrays and primitive types

Functions

parseLocaleFile

Parse a locale file’s content into a flat key-value map. The format is automatically detected from the file extension.
function parseLocaleFile(content: string, filePath: string): KeyMap
content
string
required
The raw text content of the locale file
filePath
string
required
The file path, used to determine the format from the extension (.json, .yaml, .yml, or .po)
KeyMap
Record<string, string>
A flat map of translation keys to values. Nested keys are flattened with dot notation (e.g., "user.name"). Empty strings indicate untranslated keys.

Supported Formats

JSON

.json — Nested JSON objects

YAML

.yaml, .yml — YAML with one-level nesting

PO

.po — GNU gettext PO files

Example: JSON

locales/en.json
{
  "welcome": "Welcome",
  "user": {
    "name": "Name",
    "email": "Email"
  },
  "errors": {
    "required": "This field is required",
    "invalid": "Invalid input"
  }
}
import { parseLocaleFile } from "@/lib/locale-parser"
import { getFileContent } from "@/lib/github"

const content = await getFileContent("owner", "repo", "main", "locales/en.json")
const keys = parseLocaleFile(content, "locales/en.json")

console.log(keys)
// {
//   "welcome": "Welcome",
//   "user.name": "Name",
//   "user.email": "Email",
//   "errors.required": "This field is required",
//   "errors.invalid": "Invalid input"
// }

Example: YAML

locales/en.yaml
welcome: Welcome
user:
  name: Name
  email: Email
errors:
  required: This field is required
  invalid: Invalid input
const content = await getFileContent("owner", "repo", "main", "locales/en.yaml")
const keys = parseLocaleFile(content, "locales/en.yaml")

console.log(keys)
// {
//   "welcome": "Welcome",
//   "user.name": "Name",
//   "user.email": "Email",
//   "errors.required": "This field is required",
//   "errors.invalid": "Invalid input"
// }
The YAML parser is lightweight and supports flat and one-level nested structures. For complex YAML files with deep nesting, consider using a full YAML parser library.

Example: PO (gettext)

locale/en.po
msgid ""
msgstr ""

msgid "welcome"
msgstr "Welcome"

msgid "user.name"
msgstr "Name"

msgid "user.email"
msgstr "Email"
const content = await getFileContent("owner", "repo", "main", "locale/en.po")
const keys = parseLocaleFile(content, "locale/en.po")

console.log(keys)
// {
//   "welcome": "Welcome",
//   "user.name": "Name",
//   "user.email": "Email"
// }

Errors

try {
  const keys = parseLocaleFile(content, filePath)
} catch (error) {
  // JSON: SyntaxError if invalid JSON
  // YAML: Silent failure for malformed YAML (keys are skipped)
  // PO: Silent failure for malformed entries
  // Unsupported format: "Unsupported file format: {filePath}"
}

Usage in Scan API

From app/api/scan/route.ts:52-56:
const fetchPromises = Object.entries(group.files).flatMap(
  ([locale, filePaths]) =>
    filePaths.map(async (filePath) => {
      try {
        const content = await getFileContent(owner, repo, branch, filePath)
        const keys = parseLocaleFile(content, filePath)
        // Merge into locale's key map (supports multiple files per locale)
        if (!keyMaps[locale]) keyMaps[locale] = {}
        Object.assign(keyMaps[locale], keys)
      } catch (err) {
        console.warn(`Failed to parse ${filePath}:`, err)
      }
    })
)
await Promise.all(fetchPromises)

Types

KeyMap

Flat map of translation keys to values.
type KeyMap = Record<string, string>
  • Keys: Dot-notation path (e.g., "user.name", "errors.required")
  • Values: Translated string (empty string "" indicates untranslated key)

Example

const keyMap: KeyMap = {
  "welcome": "Welcome",
  "user.name": "Name",
  "user.email": "Email",
  "errors.required": "", // ← Untranslated
}

Flattening Behavior

Nested Objects

Nested objects are flattened with dot notation:
{
  "a": {
    "b": {
      "c": "value"
    }
  }
}
// Result:
{
  "a.b.c": "value"
}

Arrays

Arrays are flattened with numeric indices:
{
  "items": ["first", "second", "third"]
}
// Result:
{
  "items.0": "first",
  "items.1": "second",
  "items.2": "third"
}

Primitive Types

Primitive types (string, number, boolean) are converted to strings:
{
  "name": "John",
  "age": 30,
  "active": true
}
// Result:
{
  "name": "John",
  "age": "30",
  "active": "true"
}

Empty Values

Empty strings are preserved (indicates untranslated key):
{
  "welcome": "Welcome",
  "goodbye": ""
}
// Result:
{
  "welcome": "Welcome",
  "goodbye": "" // ← Untranslated
}

Format-Specific Details

JSON Parser

Uses JSON.parse() with recursive flattening.
function parseJson(content: string): KeyMap {
  const data = JSON.parse(content)
  return flattenJson(data)
}
Features:
  • Full JSON5 support (via native parser)
  • Deep nesting (unlimited depth)
  • Array indexing
  • Type coercion (numbers/booleans → strings)
Limitations:
  • Strict JSON syntax (no comments, trailing commas)
  • Large files may impact performance

YAML Parser

Lightweight line-based parser for flat and one-level nested structures.
function parseYaml(content: string): KeyMap
Features:
  • Flat keys: key: value
  • One-level nesting: parent:\n child: value
  • Comment support: # comment
  • Quote handling: 'value', "value"
Limitations:
  • Maximum one level of nesting (deeper nesting is not supported)
  • No support for arrays, anchors, or advanced YAML features
  • Indentation-sensitive
For complex YAML files with deep nesting or advanced features, consider using a full YAML parser library like js-yaml.

PO (gettext) Parser

Parses GNU gettext PO files.
function parsePo(content: string): KeyMap
Features:
  • msgid / msgstr extraction
  • Multi-line string concatenation
  • Comment skipping (# lines)
  • Empty header entry handling
Limitations:
  • No support for plural forms (msgid_plural, msgstr[0])
  • No context handling (msgctxt)
  • Basic escape sequence support

Merging Multiple Files

When a locale has multiple files (e.g., en/common.json, en/errors.json), merge them into a single KeyMap:
import { parseLocaleFile } from "@/lib/locale-parser"

const files = [
  "locales/en/common.json",
  "locales/en/errors.json",
]

const mergedKeys: KeyMap = {}

for (const filePath of files) {
  const content = await getFileContent(owner, repo, branch, filePath)
  const keys = parseLocaleFile(content, filePath)
  Object.assign(mergedKeys, keys)
}

console.log(mergedKeys)
// {
//   "welcome": "Welcome",        // from common.json
//   "user.name": "Name",         // from common.json
//   "error.required": "Required", // from errors.json
//   "error.invalid": "Invalid"   // from errors.json
// }
If there are duplicate keys across files, later files overwrite earlier ones. Use a unique namespace per file to avoid conflicts.

Complete Example

import { getFileContent } from "@/lib/github"
import { parseLocaleFile, type KeyMap } from "@/lib/locale-parser"

async function parseAllLocales(
  owner: string,
  repo: string,
  branch: string,
  localeFiles: Record<string, string[]>
): Promise<Record<string, KeyMap>> {
  const keyMaps: Record<string, KeyMap> = {}

  const promises = Object.entries(localeFiles).flatMap(
    ([locale, filePaths]) =>
      filePaths.map(async (filePath) => {
        try {
          // 1. Fetch file content
          const content = await getFileContent(owner, repo, branch, filePath)
          
          // 2. Parse to flat key map
          const keys = parseLocaleFile(content, filePath)
          
          // 3. Merge into locale's key map
          if (!keyMaps[locale]) keyMaps[locale] = {}
          Object.assign(keyMaps[locale], keys)
          
          console.log(`✅ Parsed ${filePath} (${Object.keys(keys).length} keys)`)
        } catch (err) {
          console.error(`❌ Failed to parse ${filePath}:`, err)
        }
      })
  )

  await Promise.all(promises)
  return keyMaps
}

// Usage
const localeFiles = {
  "en": ["locales/en/common.json", "locales/en/errors.json"],
  "fr": ["locales/fr/common.json", "locales/fr/errors.json"],
  "es": ["locales/es/common.json", "locales/es/errors.json"],
}

const keyMaps = await parseAllLocales("owner", "repo", "main", localeFiles)

console.log(Object.keys(keyMaps["en"]).length) // Total keys in English

Performance Considerations

JSON Files

  • Fast: Uses native JSON.parse()
  • Memory: Entire file loaded into memory
  • Recommendation: Suitable for files up to several MB

YAML Files

  • Moderate: Line-by-line parsing
  • Memory: Low memory footprint
  • Recommendation: Best for simple, flat structures

PO Files

  • Moderate: Entry-based parsing
  • Memory: Low memory footprint
  • Recommendation: Suitable for standard gettext catalogs
For very large locale files (>10 MB), consider streaming parsers or chunked processing.

Build docs developers (and LLMs) love