Overview
Engines provide high-level interfaces for querying and chatting with your indexed data. LlamaIndex provides two main types:- Query Engines: Question-answering over data
- Chat Engines: Multi-turn conversations with context
Query Engines
RetrieverQueryEngine
Standard query engine combining retrieval with response synthesis.SubQuestionQueryEngine
Breaks complex questions into sub-questions.RouterQueryEngine
Routes queries to the most appropriate engine.Chat Engines
ContextChatEngine
Chat engine with retrieval augmented generation (RAG).SimpleChatEngine
Basic chat without retrieval.Streaming
Both query and chat engines support streaming:Streaming Queries
Streaming Chat
Response Synthesis
Customize how responses are generated from retrieved context:Retrieval Configuration
Multi-modal Queries
Engines support multi-modal input:Custom System Prompts
Memory Management
Node Post-processors
Filter or rerank retrieved nodes:Best Practices
- Use ContextChatEngine for RAG: Automatically retrieves relevant context
- Configure similarity threshold: Filter low-quality retrieval results
- Stream long responses: Better UX for lengthy answers
- Inspect source nodes: Verify response quality
- Use sub-question for complex queries: Break down multi-part questions
- Set appropriate top_k: Balance between context and noise