regexpChunkParse
Parse POS-tagged tokens into chunks using regular expression patterns.Parameters
Array of tokens with POS tags. Each token has:
token: string - The word/tokentag: string - POS tag (e.g., “NN”, “VB”, “JJ”)
Chunking grammar rules. Each rule follows the format:Tag Patterns:
<NN.*>- Matches any tag starting with NN (nouns)<JJ>- Matches exactly JJ (adjectives)<VB|MD>- Matches VB or MD (verbs or modals)
?- Zero or one occurrence*- Zero or more occurrences+- One or more occurrences- (none) - Exactly one occurrence
Returns
Array of chunk elements, where each element is either:TaggedToken- Unchunked token withtokenandtagChunkNode- Chunked phrase with:kind: "chunk"label: string - Chunk type (e.g., “NP”, “VP”)tokens: TaggedToken[] - Tokens in the chunk
Example
Grammar Rules
Define chunk patterns with labels and tag sequences:#. Rules can span multiple lines.
chunkTreeToIob
Convert chunk tree structure to IOB (Inside-Outside-Begin) format.Parameters
Chunk tree from
regexpChunkParseReturns
Array of IOB-tagged rows with:token: string - The word/tokentag: string - POS tagiob: string - IOB tag:"O"- Outside any chunk"B-Label"- Beginning of chunk with label"I-Label"- Inside chunk with label
Example
Use Cases
- Training sequence labeling models
- Named entity recognition data preparation
- Chunk boundary detection
- Converting between chunk representations