Overview
Chat engines enable conversational interactions with your data, maintaining chat history and context across multiple turns.
BaseChatEngine
Abstract base class for all chat engines.
import { BaseChatEngine } from "@llamaindex/core/chat-engine" ;
Properties
chatHistory
ChatMessage[] | Promise<ChatMessage[]>
The conversation history
Methods
Send a message and get a response Non-streaming: chat ( params : NonStreamingChatEngineParams ): Promise < EngineResponse >
Streaming: chat ( params : StreamingChatEngineParams ): Promise < AsyncIterable < EngineResponse >>
message
string | MessageContentDetail[]
required
The user message (text or multi-modal)
Whether to stream the response
Optional custom chat history or memory
Provider-specific chat options
The assistant’s response text
Retrieved source nodes (context chat engine only)
Additional response metadata
SimpleChatEngine
Basic chat engine without retrieval, just conversational LLM.
import { SimpleChatEngine } from "@llamaindex/core/chat-engine" ;
import { OpenAI } from "@llamaindex/openai" ;
Example
const llm = new OpenAI ({ model: "gpt-4" });
const chatEngine = new SimpleChatEngine ({ llm });
const response1 = await chatEngine . chat ({
message: "Hello! My name is Alice."
});
console . log ( response1 . response ); // "Hello Alice! How can I help you?"
const response2 = await chatEngine . chat ({
message: "What's my name?"
});
console . log ( response2 . response ); // "Your name is Alice."
ContextChatEngine
Chat engine with retrieval - retrieves relevant context for each message.
import { ContextChatEngine } from "@llamaindex/core/chat-engine" ;
Constructor Options
Retriever for fetching relevant context
Language model (defaults to Settings.llm)
System prompt for the chat
Template for injecting retrieved context
Example
import { VectorStoreIndex } from "llamaindex" ;
import { Document } from "@llamaindex/core/schema" ;
const documents = [
new Document ({ text: "The company was founded in 2020." }),
new Document ({ text: "Our main product is a data framework." })
];
const index = await VectorStoreIndex . fromDocuments ( documents );
const chatEngine = index . asChatEngine ();
const response = await chatEngine . chat ({
message: "When was the company founded?"
});
console . log ( response . response ); // "The company was founded in 2020."
console . log ( response . sourceNodes ); // Retrieved context nodes
Streaming Chat
const chatEngine = index . asChatEngine ();
const stream = await chatEngine . chat ({
message: "Tell me about the company" ,
stream: true
});
for await ( const chunk of stream ) {
process . stdout . write ( chunk . response );
}
Multi-modal Chat
Chat engines support images and other media:
const response = await chatEngine . chat ({
message: [
{ type: "text" , text: "What's in this image?" },
{
type: "image_url" ,
image_url: { url: "data:image/jpeg;base64,..." }
}
]
});
Custom Chat History
Using ChatMemoryBuffer
import { ChatMemoryBuffer } from "@llamaindex/core/memory" ;
const memory = new ChatMemoryBuffer ({ tokenLimit: 3000 });
const response = await chatEngine . chat ({
message: "Hello" ,
chatHistory: memory
});
Manual Chat History
const customHistory : ChatMessage [] = [
{ role: "user" , content: "Previous question" },
{ role: "assistant" , content: "Previous answer" }
];
const response = await chatEngine . chat ({
message: "Follow-up question" ,
chatHistory: customHistory
});
System Prompts
Setting System Prompt
const chatEngine = index . asChatEngine ({
systemPrompt: "You are a helpful assistant that always speaks in rhymes."
});
Custom Context Template
const chatEngine = index . asChatEngine ({
contextSystemPrompt: `
Use the following context to answer the question.
If you don't know, say so.
Context:
{context}
Question: {query}
`
});
Chat History Management
Accessing Chat History
const response = await chatEngine . chat ({
message: "Hello"
});
const history = await chatEngine . chatHistory ;
console . log ( history );
// [
// { role: "user", content: "Hello" },
// { role: "assistant", content: "Hi! How can I help you?" }
// ]
Resetting Chat History
import { SimpleChatEngine } from "@llamaindex/core/chat-engine" ;
const chatEngine = new SimpleChatEngine ({ llm });
// Chat history is stored in the engine
await chatEngine . chat ({ message: "Message 1" });
await chatEngine . chat ({ message: "Message 2" });
// Reset by creating new engine
const newChatEngine = new SimpleChatEngine ({ llm });
Retrieval Configuration
const chatEngine = index . asChatEngine ({
retriever: index . asRetriever ({
similarityTopK: 5 ,
mode: "default"
})
});
Custom Chat Engine
import { BaseChatEngine } from "@llamaindex/core/chat-engine" ;
import { EngineResponse } from "@llamaindex/core/schema" ;
class CustomChatEngine extends BaseChatEngine {
private history : ChatMessage [] = [];
async chat ( params : NonStreamingChatEngineParams ) : Promise < EngineResponse > {
const { message , chatHistory } = params ;
// Use provided history or internal history
const messages = chatHistory ?? this . history ;
// Add user message
messages . push ({ role: "user" , content: message });
// Generate response (custom logic)
const response = await this . generateResponse ( messages );
// Add assistant message
const assistantMessage = { role: "assistant" , content: response };
messages . push ( assistantMessage );
// Update internal history
this . history = messages ;
return {
response ,
sourceNodes: [],
metadata: {}
};
}
get chatHistory () {
return this . history ;
}
private async generateResponse ( messages : ChatMessage []) : Promise < string > {
// Custom response generation
return "Response" ;
}
}
Best Practices
Use context chat engine for RAG : Retrieves relevant information for each turn
Manage token limits : Use ChatMemoryBuffer to prevent context overflow
Provide clear system prompts : Guide the assistant’s behavior
Stream long responses : Better user experience for lengthy answers
Reset history periodically : Prevent context from becoming too large or stale
Include source nodes : Track which documents informed the response