Skip to main content

Overview

Query engines provide interfaces for querying indexed data and generating responses. They combine retrieval with response synthesis to answer questions over your data.

BaseQueryEngine

Abstract base class for all query engines.
import { BaseQueryEngine } from "@llamaindex/core/query-engine";

Methods

query
method
Query the engine with streaming or non-streaming responseNon-streaming:
query(params: NonStreamingQueryParams): Promise<EngineResponse>
Streaming:
query(params: StreamingQueryParams): Promise<AsyncIterable<EngineResponse>>
retrieve
method
Retrieve relevant nodes without generating a response
retrieve(query: QueryType): Promise<NodeWithScore[]>

QueryBundle

Enhanced query object with optional embeddings.
type QueryBundle = {
  query: MessageContent;
  customEmbeddings?: string[];
  embeddings?: number[];
};

Usage Examples

Basic Query

import { VectorStoreIndex } from "llamaindex";
import { Document } from "@llamaindex/core/schema";

const documents = [
  new Document({ text: "LlamaIndex is a data framework for LLM applications." }),
  new Document({ text: "It provides tools for ingestion, indexing, and querying." })
];

const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine();

const response = await queryEngine.query({
  query: "What is LlamaIndex?"
});

console.log(response.response);
console.log(response.sourceNodes); // Nodes used to generate response

Streaming Query

const queryEngine = index.asQueryEngine();

const stream = await queryEngine.query({
  query: "What is LlamaIndex?",
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.response);
}

Retrieve Only

const nodes = await queryEngine.retrieve("LlamaIndex");

nodes.forEach(nodeWithScore => {
  console.log(`Score: ${nodeWithScore.score}`);
  console.log(`Text: ${nodeWithScore.node.text}`);
});

QueryBundle with Custom Embeddings

import { OpenAIEmbedding } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding();
const queryEmbedding = await embedModel.getTextEmbedding("What is LlamaIndex?");

const response = await queryEngine.query({
  query: {
    query: "What is LlamaIndex?",
    embeddings: queryEmbedding
  }
});

Advanced Query Engines

RetrieverQueryEngine

Query engine that uses a retriever and response synthesizer.
import { RetrieverQueryEngine } from "llamaindex";

const queryEngine = new RetrieverQueryEngine({
  retriever: index.asRetriever(),
  responseSynthesizer: responseSynthesizer
});

SubQuestionQueryEngine

Breaks down complex queries into sub-questions.
import { SubQuestionQueryEngine } from "llamaindex";

const queryEngine = new SubQuestionQueryEngine({
  queryEngineTools: [tool1, tool2],
  responseSynthesizer: responseSynthesizer
});

const response = await queryEngine.query({
  query: "Compare feature A and feature B"
});

RouterQueryEngine

Routes queries to appropriate query engines based on content.
import { RouterQueryEngine } from "llamaindex";

const queryEngine = new RouterQueryEngine({
  selector: selector,
  queryEngineTools: [docEngine, codeEngine]
});

Response Synthesis

Query engines use response synthesizers to generate answers:
import { ResponseSynthesizer, CompactAndRefine } from "llamaindex";

const synthesizer = new ResponseSynthesizer({
  responseBuilder: new CompactAndRefine(),
  streaming: true
});

const queryEngine = index.asQueryEngine({
  responseSynthesizer: synthesizer
});

Query Events

Query engines emit events during execution:
import { Settings } from "llamaindex";

Settings.callbackManager.on("query-start", (event) => {
  console.log("Query started:", event.query);
});

Settings.callbackManager.on("query-end", (event) => {
  console.log("Query completed:", event.response);
});

const response = await queryEngine.query({ query: "What is LlamaIndex?" });

Customization

Custom Query Engine

import { BaseQueryEngine } from "@llamaindex/core/query-engine";
import { EngineResponse } from "@llamaindex/core/schema";

class CustomQueryEngine extends BaseQueryEngine {
  async _query(query: string, stream?: boolean): Promise<EngineResponse> {
    // Custom query logic
    const nodes = await this.customRetrieve(query);
    const response = await this.customSynthesize(nodes);
    
    return {
      response: response,
      sourceNodes: nodes,
      metadata: {}
    };
  }
  
  private async customRetrieve(query: string) {
    // Custom retrieval logic
    return [];
  }
  
  private async customSynthesize(nodes: NodeWithScore[]) {
    // Custom synthesis logic
    return "Generated response";
  }
}

Best Practices

  1. Use streaming for long responses: Improves perceived latency
  2. Inspect source nodes: Verify response quality by checking retrieved sources
  3. Configure retrieval parameters: Adjust top_k and similarity threshold for better results
  4. Handle errors gracefully: Implement error handling for failed queries
  5. Cache embeddings: Reuse QueryBundle with embeddings for repeated queries

Build docs developers (and LLMs) love