Earley Parser

earleyRecognize

Recognize if tokens can be parsed by a CFG grammar using Earley algorithm.

function earleyRecognize(
  tokens: string[],
  grammar: CfgGrammar,
  options?: { startSymbol?: string }
): boolean

Parameters

tokens

string[]

required

Array of tokens to recognize

grammar

CfgGrammar

required

Parsed CFG grammar from parseCfgGrammar

options.startSymbol

string

Override start symbol from grammar

Returns

true if tokens can be parsed by the grammar, false otherwise.

Example

import { parseCfgGrammar, earleyRecognize } from "bun_nltk";

const grammar = parseCfgGrammar(`
S -> NP VP
NP -> 'the' NN
VP -> VB NP
NN -> 'dog'
VB -> 'saw'
`);

console.log(earleyRecognize(["the", "dog", "saw", "the", "dog"], grammar));
// true

console.log(earleyRecognize(["dog", "the"], grammar));
// false

Algorithm

Earley parsing is a top-down chart parsing algorithm:

Prediction: Add states for rules starting with expected nonterminal
Scanning: Advance dot past matching terminal
Completion: Advance parent states when rule is complete

Advantages over CYK:

Works with any CFG (no CNF conversion)
More efficient for certain grammars
Natural left-to-right processing

earleyParse

Parse tokens using Earley recognition and chart parsing.

function earleyParse(
  tokens: string[],
  grammar: CfgGrammar,
  options?: {
    maxTrees?: number;
    startSymbol?: string;
  }
): ParseTree[]

Parameters

tokens

string[]

required

Array of tokens to parse

grammar

CfgGrammar

required

Parsed CFG grammar from parseCfgGrammar

options.maxTrees

number

default:8

Maximum number of parse trees to return

options.startSymbol

string

Override start symbol from grammar

Returns

Array of parse trees (up to maxTrees). Returns empty array if input cannot be parsed. Each tree has:

label: string - Node label (nonterminal or terminal)
children: Array of ParseTree or string - Child nodes or terminal strings

Example

import { parseCfgGrammar, earleyParse } from "bun_nltk";

const grammar = parseCfgGrammar(`
S -> NP VP
NP -> 'she' | 'he'
VP -> VB | VB NP
VB -> 'runs' | 'likes'
`);

const trees = earleyParse(["she", "runs"], grammar);
console.log(trees[0]);
// {
//   label: "S",
//   children: [
//     { label: "NP", children: ["she"] },
//     { label: "VP", children: [{ label: "VB", children: ["runs"] }] }
//   ]
// }

Implementation

Uses earleyRecognize to check if input is parsable
If recognized, uses chartParse to build parse trees
Returns empty array for unparsable input

parseTextWithEarley

Parse natural language text using Earley algorithm.

function parseTextWithEarley(
  text: string,
  grammar: CfgGrammar | string,
  options?: {
    maxTrees?: number;
    startSymbol?: string;
    normalizeTokens?: boolean;
  }
): ParseTree[]

Parameters

text

string

required

Natural language text to parse

grammar

CfgGrammar | string

required

Parsed grammar or grammar text string

options.maxTrees

number

default:8

Maximum parse trees to return

options.startSymbol

string

Override grammar start symbol

options.normalizeTokens

boolean

default:true

Convert tokens to lowercase before parsing

Returns

Array of parse trees. See earleyParse for tree structure.

Example

import { parseTextWithEarley } from "bun_nltk";

const grammar = `
S -> NP VP
NP -> 'i' | 'you'
VP -> VB NP | VB
VB -> 'like' | 'see'
`;

const trees = parseTextWithEarley("I like you", grammar);
console.log(trees[0]);
// {
//   label: "S",
//   children: [
//     { label: "NP", children: ["i"] },
//     { label: "VP", children: [
//       { label: "VB", children: ["like"] },
//       { label: "NP", children: ["you"] }
//     ]}
//   ]
// }

Processing

Tokenizes text using word tokenizer
Filters to alphanumeric tokens
Normalizes to lowercase (if enabled)
Parses with earleyParse

When to Use Earley

Prefer Earley parser when:

Grammar has many unary or epsilon rules
Left-recursive grammars (after transformation)
Real-time parsing requirements
Grammar is frequently modified

Prefer CYK parser (parseTextWithCfg) when:

Grammar is fixed and can be optimized
Need all parse trees efficiently
Grammar naturally fits CNF

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

earleyRecognize

Parameters

Returns

Example

Algorithm

earleyParse

Parameters

Returns

Example

Implementation

parseTextWithEarley

Parameters

Returns

Example

Processing

When to Use Earley

Build docs developers (and LLMs) love

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

​earleyRecognize

​Parameters

​Returns

​Example

​Algorithm

​earleyParse

​Parameters

​Returns

​Example

​Implementation

​parseTextWithEarley

​Parameters

​Returns

​Example

​Processing

​When to Use Earley

Build docs developers (and LLMs) love

earleyRecognize

Parameters

Returns

Example

Algorithm

earleyParse

Parameters

Returns

Example

Implementation

parseTextWithEarley

Parameters

Returns

Example

Processing

When to Use Earley