CFG Parser

parseCfgGrammar

Parse context-free grammar (CFG) from text format.

function parseCfgGrammar(
  grammarText: string,
  options?: { startSymbol?: string }
): CfgGrammar

Parameters

grammarText

string

required

Grammar rules in text format. Each rule has the form:

Nonterminal -> RHS | RHS | ...

Left-hand side: Single nonterminal (uppercase, e.g., S, NP, VP)
Right-hand side: Space-separated symbols
- Nonterminals: Uppercase identifiers
- Terminals: Quoted strings ('word' or "word")
Multiple alternatives separated by |
Comments start with #

options.startSymbol

string

Start symbol for parsing. Defaults to the left-hand side of the first rule.

Returns

Parsed grammar object with:

startSymbol: string - Grammar start symbol
productions: CfgProduction[] - Array of productions, each with:
- lhs: string - Left-hand side nonterminal
- rhs: string[] - Right-hand side symbols

Example

import { parseCfgGrammar } from "bun_nltk";

const grammarText = `
# Simple grammar
S -> NP VP
NP -> DT NN | 'she'
VP -> VB NP | VB
DT -> 'the' | 'a'
NN -> 'dog' | 'cat'
VB -> 'saw' | 'walked'
`;

const grammar = parseCfgGrammar(grammarText);
// {
//   startSymbol: "S",
//   productions: [
//     { lhs: "S", rhs: ["NP", "VP"] },
//     { lhs: "NP", rhs: ["DT", "NN"] },
//     { lhs: "NP", rhs: ["she"] },
//     ...
//   ]
// }

chartParse

Parse tokens using CYK chart parsing algorithm.

function chartParse(
  tokens: string[],
  grammar: CfgGrammar,
  options?: {
    maxTrees?: number;
    startSymbol?: string;
  }
): ParseTree[]

Parameters

tokens

string[]

required

Array of tokens to parse

grammar

CfgGrammar

required

Parsed CFG grammar from parseCfgGrammar

options.maxTrees

number

default:8

Maximum number of parse trees to return. Must be at least 1.

options.startSymbol

string

Override start symbol from grammar

Returns

Array of parse trees (up to maxTrees), where each tree has:

label: string - Node label (nonterminal or terminal)
children: Array of ParseTree or string - Child nodes or terminal strings

Trees are sorted by node count (simplest first).

Example

import { parseCfgGrammar, chartParse } from "bun_nltk";

const grammar = parseCfgGrammar(`
S -> NP VP
NP -> 'she'
VP -> 'runs'
`);

const tokens = ["she", "runs"];
const trees = chartParse(tokens, grammar);

console.log(trees[0]);
// {
//   label: "S",
//   children: [
//     { label: "NP", children: ["she"] },
//     { label: "VP", children: ["runs"] }
//   ]
// }

Algorithm

Uses CYK (Cocke-Younger-Kasami) chart parsing:

Converts grammar to Chomsky Normal Form (CNF)
Builds parse chart bottom-up
Handles unary chains and binary rules
Native optimization for recognition

parseTextWithCfg

Parse natural language text using CFG grammar.

function parseTextWithCfg(
  text: string,
  grammar: CfgGrammar | string,
  options?: {
    maxTrees?: number;
    startSymbol?: string;
    normalizeTokens?: boolean;
  }
): ParseTree[]

Parameters

text

string

required

Natural language text to parse

grammar

CfgGrammar | string

required

Parsed grammar or grammar text string

options.maxTrees

number

default:8

Maximum parse trees to return

options.startSymbol

string

Override grammar start symbol

options.normalizeTokens

boolean

default:true

Convert tokens to lowercase before parsing

Returns

Array of parse trees. See chartParse for tree structure.

Example

import { parseTextWithCfg } from "bun_nltk";

const grammar = `
S -> NP VP
NP -> 'the' NN | 'she'
VP -> VB | VB NP
NN -> 'dog' | 'cat'
VB -> 'runs' | 'sees'
`;

const trees = parseTextWithCfg("She runs", grammar);
console.log(trees[0]);
// {
//   label: "S",
//   children: [
//     { label: "NP", children: ["she"] },
//     { label: "VP", children: [{ label: "VB", children: ["runs"] }] }
//   ]
// }

Processing

Tokenizes text using word tokenizer
Filters to alphanumeric tokens
Normalizes to lowercase (if enabled)
Parses with chartParse

Grammar Format

See parseCfgGrammar for grammar syntax details.

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

parseCfgGrammar

Parameters

Returns

Example

chartParse

Parameters

Returns

Example

Algorithm

parseTextWithCfg

Parameters

Returns

Example

Processing

Grammar Format

Build docs developers (and LLMs) love

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

​parseCfgGrammar

​Parameters

​Returns

​Example

​chartParse

​Parameters

​Returns

​Example

​Algorithm

​parseTextWithCfg

​Parameters

​Returns

​Example

​Processing

​Grammar Format

Build docs developers (and LLMs) love

parseCfgGrammar

Parameters

Returns

Example

chartParse

Parameters

Returns

Example

Algorithm

parseTextWithCfg

Parameters

Returns

Example

Processing

Grammar Format