Skip to main content
LlamaIndex.TS processes data through a series of transformations, from raw documents to indexed embeddings to final query responses. Understanding this flow is crucial for building effective LLM applications.

Overview

The data flow consists of two main pipelines:

Ingestion Pipeline

Document → Nodes → Embeddings → Vector Store

Query Pipeline

Query → Embedding → Retrieval → Synthesis → Response

Ingestion Pipeline

The ingestion pipeline transforms raw documents into searchable vector embeddings.

Step 1: Document Loading

Load documents
import { Document } from "llamaindex";

// Create documents from text
const documents = [
  new Document({ 
    text: "LlamaIndex is a data framework for LLM applications.",
    metadata: { source: "docs", page: 1 }
  }),
  new Document({ 
    text: "It provides tools for data ingestion, indexing, and querying.",
    metadata: { source: "docs", page: 2 }
  }),
];

// Or load from files
import { SimpleDirectoryReader } from "llamaindex";
const reader = new SimpleDirectoryReader();
const docs = await reader.loadData({ directoryPath: "./documents" });
Document Structure:
From @llamaindex/core/schema/node.ts
export class Document extends TextNode {
  id_: string;           // Unique document ID
  text: string;          // Document content
  metadata: Metadata;    // Arbitrary metadata
  embedding?: number[];  // Optional embedding
  relationships: {...};  // Links to other nodes
}

Step 2: Node Parsing (Chunking)

Documents are split into smaller chunks called Nodes:
Node parsing
import { SentenceSplitter, Settings } from "llamaindex";

// Configure the node parser
Settings.nodeParser = new SentenceSplitter({
  chunkSize: 1024,      // Max tokens per chunk
  chunkOverlap: 200,    // Overlap between chunks
});

// Parse documents into nodes
const nodes = await Settings.nodeParser(documents);
Why chunk documents?
LLMs have maximum context windows (e.g., 4K, 8K, 128K tokens). Chunking ensures content fits within these limits.
Smaller chunks often represent more coherent semantic units, improving retrieval accuracy.
Fine-grained chunks allow more precise retrieval of relevant information.
Node Structure:
BaseNode from @llamaindex/core/schema
abstract class BaseNode<T extends Metadata = Metadata> {
  id_: string;                    // Unique node ID
  embedding?: number[];           // Vector embedding
  metadata: T;                    // Inherited + additional metadata
  excludedEmbedMetadataKeys: string[];  // Keys to exclude from embedding
  excludedLlmMetadataKeys: string[];    // Keys to exclude from LLM
  relationships: Record<NodeRelationship, RelatedNodeType>;
  hash: string;                   // Content hash for deduplication
  
  abstract getContent(metadataMode: MetadataMode): string;
}

Step 3: Embedding Generation

Each node is converted to a vector embedding:
Generate embeddings
import { Settings } from "llamaindex";
import { OpenAIEmbedding } from "@llamaindex/openai";

Settings.embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-large",
  dimensions: 1024,
});

// Embeddings are generated automatically during indexing
const index = await VectorStoreIndex.fromDocuments(documents);
How it works:
From packages/llamaindex/src/indices/vectorStore/index.ts
async getNodeEmbeddingResults(nodes: BaseNode[]): Promise<BaseNode[]> {
  const nodeMap = splitNodesByType(nodes);
  for (const type in nodeMap) {
    const nodes = nodeMap[type as ModalityType];
    const embedModel = this.vectorStores[type]?.embedModel ?? this.embedModel;
    if (embedModel && nodes) {
      await embedModel(nodes, {
        logProgress: options?.logProgress,
      });
    }
  }
  return nodes;
}
Embeddings are vector representations of text that capture semantic meaning. Similar texts have similar embeddings.

Step 4: Vector Store Insertion

Nodes with embeddings are stored in a vector database:
Store in vector database
import { VectorStoreIndex } from "llamaindex";
import { PineconeVectorStore } from "@llamaindex/pinecone";

const vectorStore = new PineconeVectorStore({
  apiKey: process.env.PINECONE_API_KEY,
  indexName: "my-index",
});

const index = await VectorStoreIndex.fromDocuments(documents, {
  vectorStores: { TEXT: vectorStore },
});
What gets stored:
  • Node ID
  • Embedding vector
  • Node content (if storesText: true)
  • Metadata for filtering

Complete Ingestion Example

Full ingestion pipeline
import { 
  Settings, 
  VectorStoreIndex, 
  Document,
  SentenceSplitter 
} from "llamaindex";
import { OpenAI, OpenAIEmbedding } from "@llamaindex/openai";
import { PineconeVectorStore } from "@llamaindex/pinecone";

// 1. Configure Settings
Settings.llm = new OpenAI({ model: "gpt-4" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-large" });
Settings.nodeParser = new SentenceSplitter({ chunkSize: 1024, chunkOverlap: 200 });

// 2. Create documents
const documents = [
  new Document({ text: "Your content here...", metadata: { source: "doc1" } }),
];

// 3. Create vector store
const vectorStore = new PineconeVectorStore({
  apiKey: process.env.PINECONE_API_KEY!,
  indexName: "llamaindex-demo",
});

// 4. Build index (automatically handles steps 2-4)
const index = await VectorStoreIndex.fromDocuments(documents, {
  vectorStores: { TEXT: vectorStore },
  logProgress: true,  // Show progress in console
});

console.log("Ingestion complete!");

Advanced: IngestionPipeline

For more control, use the IngestionPipeline:
Custom ingestion pipeline
import { IngestionPipeline } from "llamaindex/ingestion";
import { SentenceSplitter } from "llamaindex";
import { TitleExtractor, SummaryExtractor } from "llamaindex/extractors";

const pipeline = new IngestionPipeline({
  transformations: [
    new SentenceSplitter({ chunkSize: 1024 }),
    new TitleExtractor(),     // Extract titles from content
    new SummaryExtractor(),   // Generate summaries
  ],
  vectorStores: { TEXT: vectorStore },
});

const nodes = await pipeline.run({ documents });
Pipeline features:
const pipeline = new IngestionPipeline({
  transformations: [...],
  vectorStores: { TEXT: vectorStore },
  disableCache: false,  // Enable caching (default)
});

// Cached transformations are reused
await pipeline.run({ documents });

Query/Retrieval Pipeline

The query pipeline retrieves relevant nodes and synthesizes responses.

Step 1: Query Embedding

Query embedding
const query = "What is LlamaIndex?";

// Query is embedded using the same embedding model
const queryEmbedding = await Settings.embedModel.getQueryEmbedding(query);
Always use the same embedding model for indexing and querying to ensure compatibility.
Retrieve similar nodes
const retriever = index.asRetriever({ similarityTopK: 5 });
const nodesWithScores = await retriever.retrieve({ query });

// Returns nodes ranked by similarity
nodesWithScores.forEach(({ node, score }) => {
  console.log(`Score: ${score}, Text: ${node.text}`);
});
From VectorIndexRetriever:
packages/llamaindex/src/indices/vectorStore/index.ts
protected async retrieveQuery(
  query: MessageContent,
  type: ModalityType,
  vectorStore: BaseVectorStore,
): Promise<NodeWithScore[]> {
  // 1. Embed the query
  const embedModel = this.index.embedModel ?? vectorStore.embedModel;
  const queryEmbedding = await embedModel.getQueryEmbedding(query);
  
  // 2. Query vector store
  const result = await vectorStore.query({
    queryStr,
    queryEmbedding,
    mode: this.queryMode,
    similarityTopK: this.topK[type],
    filters: this.filters,
  });
  
  // 3. Build nodes with scores
  return this.buildNodeListFromQueryResult(result);
}

Step 3: Response Synthesis

Retrieved nodes are sent to the LLM to generate a response:
Query engine
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({ 
  query: "What is LlamaIndex?" 
});

console.log(response.toString());
What happens:
  1. Query is embedded
  2. Top-K similar nodes are retrieved
  3. Nodes are formatted into a context
  4. LLM generates response using context + query

Step 4: Post-Processing (Optional)

Apply reranking or filtering:
Reranking
import { SimilarityPostprocessor } from "llamaindex/postprocessors";

const queryEngine = index.asQueryEngine({
  nodePostprocessors: [
    new SimilarityPostprocessor({ similarityCutoff: 0.7 })
  ],
});

Complete Query Example

Full query pipeline
import { VectorStoreIndex, Settings } from "llamaindex";
import { OpenAI, OpenAIEmbedding } from "@llamaindex/openai";

// Configure (same as ingestion)
Settings.llm = new OpenAI({ model: "gpt-4" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-large" });

// Load existing index
const index = await VectorStoreIndex.fromVectorStore(vectorStore);

// Create query engine
const queryEngine = index.asQueryEngine({
  similarityTopK: 3,  // Retrieve top 3 nodes
});

// Query
const response = await queryEngine.query({
  query: "How does LlamaIndex handle data ingestion?"
});

console.log("Response:", response.toString());
console.log("Source Nodes:", response.sourceNodes?.length);

Chat Engine Flow

Chat engines maintain conversation history:
Chat engine
const chatEngine = index.asChatEngine();

// First message
const response1 = await chatEngine.chat({
  message: "What is LlamaIndex?"
});

// Follow-up (uses history)
const response2 = await chatEngine.chat({
  message: "How do I install it?"
});
Chat flow:

Streaming Responses

Stream responses for better UX:
Streaming
const queryEngine = index.asQueryEngine();

const stream = await queryEngine.query({
  query: "Explain LlamaIndex",
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.delta);
}

Data Flow Optimization

Chunk Size

Smaller chunks = more precise retrieval but more chunks to searchLarger chunks = more context but less precision

Top-K

Higher K = more context but slower and more expensiveLower K = faster but might miss relevant info

Embedding Model

Better embeddings = better retrieval accuracyLarger dimensions = more accurate but slower

Reranking

Post-retrieval reranking improves precisionUse similarity cutoff to filter low-quality matches

Best Practices

1

Consistent Embeddings

Use the same embedding model for both indexing and querying
2

Optimal Chunk Size

Test different chunk sizes (512, 1024, 2048) for your use case
3

Metadata Filtering

Use metadata to filter retrieval results (e.g., by date, source, category)
4

Monitor Performance

Track retrieval quality with evaluation metrics
5

Cache Embeddings

Use IngestionPipeline caching to avoid re-embedding unchanged documents

Next Steps

Vector Indices

Deep dive into VectorStoreIndex configuration and usage

Query Engines

Learn about different query engine types and customization

Ingestion Pipeline

Advanced ingestion pipeline patterns and transformations

Build docs developers (and LLMs) love