Overview
Node parsers split documents into smaller chunks (nodes) for processing. They handle text segmentation, maintain relationships between chunks, and preserve metadata.
NodeParser
Abstract base class for all node parsers.
import { NodeParser } from "@llamaindex/core/node-parser" ;
Properties
Whether to include document metadata in parsed nodes
Whether to include previous/next relationships between consecutive chunks
Methods
Parse documents into nodes getNodesFromDocuments ( documents : TextNode []): TextNode [] | Promise < TextNode [] >
Parsed text nodes with relationships and metadata
TextSplitter
Abstract base class for text splitting strategies.
import { TextSplitter } from "@llamaindex/core/node-parser" ;
Methods
Split a single text into chunks abstract splitText ( text : string ): string []
Split multiple texts into chunks splitTexts ( texts : string []): string []
SentenceSplitter
Splits text by sentences with configurable chunk size and overlap.
import { SentenceSplitter } from "@llamaindex/core/node-parser" ;
Constructor Options
Maximum number of characters per chunk
Number of characters to overlap between chunks
Separator to use when splitting
Separator for paragraph boundaries
Secondary separator (e.g., line breaks)
Example
import { SentenceSplitter } from "@llamaindex/core/node-parser" ;
import { Document } from "@llamaindex/core/schema" ;
const parser = new SentenceSplitter ({
chunkSize: 512 ,
chunkOverlap: 50
});
const document = new Document ({
text: "Long document text..."
});
const nodes = parser . getNodesFromDocuments ([ document ]);
console . log ( nodes . length ); // Number of chunks created
MarkdownNodeParser
Splits markdown documents while preserving structure.
import { MarkdownNodeParser } from "@llamaindex/core/node-parser" ;
Constructor Options
Maximum characters per chunk
Example
const parser = new MarkdownNodeParser ({
chunkSize: 1024 ,
chunkOverlap: 100
});
const document = new Document ({
text: "# Heading \n\n Paragraph text..." ,
metadata: { format: "markdown" }
});
const nodes = parser . getNodesFromDocuments ([ document ]);
MetadataAwareTextSplitter
Abstract base for splitters that consider metadata when chunking.
abstract class MetadataAwareTextSplitter extends TextSplitter {
abstract splitTextMetadataAware (
text : string ,
metadata : string
) : string [];
}
Useful when metadata should be included in chunk size calculations.
Node Relationships
Parsed nodes automatically include relationships:
const nodes = parser . getNodesFromDocuments ([ document ]);
// First node
console . log ( nodes [ 0 ]. relationships );
// {
// [NodeRelationship.SOURCE]: { nodeId: "doc-id", ... },
// [NodeRelationship.NEXT]: { nodeId: "node-1-id", ... }
// }
// Middle node
console . log ( nodes [ 1 ]. relationships );
// {
// [NodeRelationship.SOURCE]: { nodeId: "doc-id", ... },
// [NodeRelationship.PREVIOUS]: { nodeId: "node-0-id", ... },
// [NodeRelationship.NEXT]: { nodeId: "node-2-id", ... }
// }
Nodes inherit metadata from parent documents:
const document = new Document ({
text: "Document text..." ,
metadata: {
title: "My Document" ,
author: "John Doe"
}
});
const nodes = parser . getNodesFromDocuments ([ document ]);
// All nodes inherit parent metadata
console . log ( nodes [ 0 ]. metadata );
// { title: "My Document", author: "John Doe" }
Character Positions
Parsers track character positions in the original document:
const nodes = parser . getNodesFromDocuments ([ document ]);
console . log ( nodes [ 0 ]. startCharIdx ); // 0
console . log ( nodes [ 0 ]. endCharIdx ); // 512
console . log ( nodes [ 1 ]. startCharIdx ); // 462 (with overlap)
console . log ( nodes [ 1 ]. endCharIdx ); // 1024
Custom Node Parser
Create custom parsers by extending NodeParser:
import { NodeParser } from "@llamaindex/core/node-parser" ;
import { TextNode } from "@llamaindex/core/schema" ;
class CustomParser extends NodeParser {
protected parseNodes ( documents : TextNode []) : TextNode [] {
return documents . flatMap ( doc => {
// Custom splitting logic
const chunks = this . customSplit ( doc . text );
return chunks . map ( chunk => new TextNode ({
text: chunk ,
metadata: { ... doc . metadata }
}));
});
}
private customSplit ( text : string ) : string [] {
// Your custom splitting logic
return text . split ( / \n --- \n / );
}
}
Best Practices
Choose appropriate chunk size : Smaller chunks (256-512) for precise retrieval, larger chunks (1024-2048) for more context
Use overlap : 10-20% overlap helps maintain context across chunk boundaries
Preserve structure : Use MarkdownNodeParser for markdown to maintain headings and formatting
Consider token limits : Account for model context windows when setting chunk sizes