Skip to main content

Overview

The parser generates a typed Abstract Syntax Tree (AST) with two main categories of nodes: block nodes (structural elements) and inline nodes (text-level formatting). All node types follow the CommonMark specification.
The parser exports two main types: BlockNode for structural elements and InlineNode for inline content. These are fully typed for TypeScript users.

Node hierarchy

Block nodes form the document structure and can contain inline nodes as children:
BlockNode[]                    ← Document root (array of blocks)
  ├─ BlockNode                 ← Structural elements
  │   └─ InlineNode[]          ← Text-level formatting
  │       └─ InlineNode        ← Nested formatting
  └─ BlockNode (containers)
      └─ BlockNode[]           ← Nested blocks

Block nodes

Block nodes represent document structure and layout. They are defined in markdown-parser.ts:840-849.

Leaf blocks

Leaf blocks cannot contain other block elements:
ATX-style (1-6 # characters) or Setext-style (underlined with = or -):
{
  type: "heading",
  level: 1 | 2 | 3 | 4 | 5 | 6,
  children: InlineNode[]
}
Internal representation (markdown-parser.ts:753-759):
type HeadingNode_internal = {
  type: "heading";
  level: 1 | 2 | 3 | 4 | 5 | 6;
  content: string;
  isClosed: true; // Always closed immediately
  parent: RootNode_internal | BlockquoteNode_internal | ListItemNode_internal;
};
Example:
# Top level heading
## Second level

Container blocks

Container blocks can contain other block elements:
Nested quotations (lines starting with >):
{
  type: "blockquote",
  children: BlockNode[]
}
Internal representation (markdown-parser.ts:794-799):
type BlockquoteNode_internal = {
  type: "blockquote";
  children: Array<BlockNode_internal>;
  isClosed: boolean;
  parent: RootNode_internal | BlockquoteNode_internal | ListItemNode_internal;
};
Example:
> This is a quote
> that spans lines
>>
>> Nested quote

Inline nodes

Inline nodes represent text-level formatting. They are defined in inline-parser.ts:580-589.
Plain text content:
{
  type: "text",
  text: string
}
Interface (inline-parser.ts:519-522):
interface TextNode {
  type: "text";
  text: string;
}
Adjacent text nodes are merged (inline-parser.ts:645-664):
function mergeAdjacentTextNodes(nodes: Array<InlineNode>): Array<InlineNode> {
  const result: Array<InlineNode> = [];
  for (const node of nodes) {
    if (node.type === "text") {
      const lastNode = result[result.length - 1];
      if (lastNode?.type === "text") {
        lastNode.text += node.text;
      } else {
        result.push(node);
      }
    }
  }
  return result;
}

Type exports

The parser exports TypeScript types from index.ts:1-5:
export type { InlineNode } from "./inline-parser";
export {
  type BlockNode,
  MarkdownParser,
} from "./markdown-parser";

Working with the AST

Recursively walk the tree:
function walkNodes(nodes: BlockNode[], visitor: (node: BlockNode) => void) {
  for (const node of nodes) {
    visitor(node);
    
    if (node.type === "blockquote" || node.type === "list") {
      if (node.type === "list") {
        for (const item of node.items) {
          walkNodes(item.children, visitor);
        }
      } else {
        walkNodes(node.children, visitor);
      }
    }
  }
}
All node types follow the CommonMark 0.31.2 specification. The parser’s implementation closely mirrors the reference implementation structure.

Build docs developers (and LLMs) love