parsePcfgGrammar
Parse probabilistic context-free grammar (PCFG) from text format.Parameters
Grammar rules with optional probabilities. Each rule has the form:
- Left-hand side: Single nonterminal (uppercase)
- Right-hand side: Space-separated symbols
- Nonterminals: Uppercase identifiers
- Terminals: Quoted strings (
'word'or"word")
- Probability: Optional
[0.5]after RHS - Multiple alternatives separated by
| - Comments start with
#
- If no probabilities given: uniform distribution
- If some probabilities given: remaining mass distributed uniformly
- All probabilities normalized to sum to 1.0 per nonterminal
Start symbol for parsing. Defaults to the left-hand side of the first rule.
Returns
Parsed probabilistic grammar with:startSymbol: string - Grammar start symbolproductions: PcfgProduction[] - Array of weighted productions, each with:lhs: string - Left-hand side nonterminalrhs: string[] - Right-hand side symbolsprob: number - Production probability (0.0 to 1.0)
Example
Partial Probabilities
probabilisticChartParse
Find best parse using probabilistic CYK algorithm.Parameters
Array of tokens to parse
Parsed PCFG grammar from
parsePcfgGrammarOverride start symbol from grammar
Returns
Best parse with probability, ornull if unparsable:
tree: ParseTree - Best parse tree structurelabel: string - Node labelchildren: Array of ParseTree or string - Child nodes
logProb: number - Log probability of parseprob: number - Probability of parse (0.0 to 1.0)
Example
Algorithm
Uses probabilistic CYK (Viterbi variant):- Converts grammar to weighted CNF
- Builds chart with best parse per cell
- Uses log probabilities to avoid underflow
- Returns single highest-probability parse
parseTextWithPcfg
Parse natural language text using PCFG.Parameters
Natural language text to parse
Parsed grammar or grammar text string
Override grammar start symbol
Convert tokens to lowercase before parsing
Returns
Best parse with probability, ornull if unparsable. See probabilisticChartParse for structure.
Example
Ambiguous Sentences
Processing
- Tokenizes text using word tokenizer
- Filters to alphanumeric tokens
- Normalizes to lowercase (if enabled)
- Finds best parse with
probabilisticChartParse
Use Cases
- Disambiguating syntactic structure
- Learning grammars from treebanks
- Language modeling
- MT and parsing systems
- Preferring common constructions