Response Synthesizers

Response synthesizers take retrieved nodes and generate a final response to the user’s query. They control how context is presented to the LLM and how the final answer is constructed.

Overview

All synthesizers extend BaseSynthesizer and implement the synthesize() method. They differ in how they combine multiple text chunks into a coherent response.

import { getResponseSynthesizer } from "@llamaindex/core/response-synthesizers";

const synthesizer = getResponseSynthesizer("compact");

const response = await synthesizer.synthesize({
  query: "What is LlamaIndex?",
  nodes: retrievedNodes,
});

Synthesis Modes

LlamaIndex provides four built-in synthesis strategies:

Compact (Default)

Best for: Most use cases, balances quality and efficiency. Compacts text chunks to fit within the context window, then refines the response:

import { CompactAndRefine } from "@llamaindex/core/response-synthesizers";

const synthesizer = new CompactAndRefine({
  textQATemplate: customTextQAPrompt,
  refineTemplate: customRefinePrompt,
});

const queryEngine = index.asQueryEngine({
  responseSynthesizer: synthesizer,
});

How it works:

Combines chunks to maximize context window usage
Generates initial response from first compact chunk
Refines response with subsequent chunks

Refine

Best for: Comprehensive answers requiring all context. Builds response iteratively, refining with each chunk:

import { Refine } from "@llamaindex/core/response-synthesizers";

const synthesizer = new Refine({
  textQATemplate: myTextQAPrompt,
  refineTemplate: myRefinePrompt,
});

How it works:

Generate initial answer from first chunk
For each subsequent chunk:
- Present existing answer + new chunk
- Ask LLM to refine the answer
Return final refined answer

Pros:

Most comprehensive, considers all context
Good for complex queries

Cons:

Requires multiple LLM calls (one per chunk)
Slower and more expensive

Tree Summarize

Best for: Summarization tasks, parallel processing. Recursively summarizes chunks in a tree structure:

import { TreeSummarize } from "@llamaindex/core/response-synthesizers";

const synthesizer = new TreeSummarize({
  summaryTemplate: customSummaryPrompt,
});

How it works:

Pack chunks to fit context window
If single chunk: generate answer directly
If multiple chunks:
- Summarize each chunk in parallel
- Recursively summarize summaries
- Return final summary

Pros:

Parallelizable (faster for many chunks)
Good for summarization

Cons:

May lose details in recursive summarization
Not ideal for precise Q&A

Best for: Images and multi-modal content. Handles images and other non-text content:

import { MultiModal } from "@llamaindex/core/response-synthesizers";
import { MetadataMode } from "@llamaindex/core/schema";

const synthesizer = new MultiModal({
  textQATemplate: multiModalPrompt,
  metadataMode: MetadataMode.NONE,
});

How it works:

Preserves multi-modal content (text + images)
Formats prompt with all content types
Sends to multi-modal LLM

Factory Function

Use getResponseSynthesizer() for simple cases:

import { getResponseSynthesizer } from "@llamaindex/core/response-synthesizers";

const synthesizer = getResponseSynthesizer("tree_summarize", {
  summaryTemplate: customPrompt,
  llm: myLLM,
});

Available modes:

"compact" - CompactAndRefine
"refine" - Refine
"tree_summarize" - TreeSummarize
"multi_modal" - MultiModal

Streaming Responses

All synthesizers support streaming:

const stream = await synthesizer.synthesize(
  {
    query: "Explain LlamaIndex",
    nodes: retrievedNodes,
  },
  true // Enable streaming
);

for await (const chunk of stream) {
  process.stdout.write(chunk.response);
}

Custom Prompts

Customize the prompts used by synthesizers:

import { PromptTemplate } from "@llamaindex/core/prompts";

const textQAPrompt = new PromptTemplate({
  template: `Context information:
{context}

Query: {query}

Provide a detailed answer based only on the context above.`,
});

const refinePrompt = new PromptTemplate({
  template: `Original query: {query}

Existing answer: {existingAnswer}

New context: {context}

Refine the existing answer using the new context.`,
});

const synthesizer = new Refine({
  textQATemplate: textQAPrompt,
  refineTemplate: refinePrompt,
});

Using with Query Engines

Integrate synthesizers into query engines:

import { CompactAndRefine } from "@llamaindex/core/response-synthesizers";

const queryEngine = index.asQueryEngine({
  responseSynthesizer: new CompactAndRefine(),
  retriever: index.asRetriever({ similarityTopK: 5 }),
});

const response = await queryEngine.query({
  query: "What are the key features?",
});

console.log(response.toString());

Custom Synthesizers

Implement custom synthesis logic:

import { BaseSynthesizer } from "@llamaindex/core/response-synthesizers";
import { EngineResponse } from "@llamaindex/core/schema";
import type { MessageContent } from "@llamaindex/core/llms";
import type { NodeWithScore } from "@llamaindex/core/schema";

class BulletPointSynthesizer extends BaseSynthesizer {
  protected async getResponse(
    query: MessageContent,
    nodes: NodeWithScore[],
    stream: boolean
  ): Promise<EngineResponse | AsyncIterable<EngineResponse>> {
    // Combine context from all nodes
    const context = nodes
      .map((n) => n.node.getContent())
      .join("\n\n");

    const prompt = `Based on this context:
${context}

Answer this question with bullet points: ${query}

Answer:`;

    if (stream) {
      const responseStream = await this.llm.complete({
        prompt,
        stream: true,
      });
      
      async function* convert() {
        for await (const chunk of responseStream) {
          yield EngineResponse.fromResponse(chunk.text, true, nodes);
        }
      }
      return convert();
    }

    const response = await this.llm.complete({
      prompt,
      stream: false,
    });

    return EngineResponse.fromResponse(response.text, false, nodes);
  }

  protected _getPrompts() {
    return {};
  }

  protected _getPromptModules() {
    return {};
  }

  protected _updatePrompts() {}
}

// Use the custom synthesizer
const synthesizer = new BulletPointSynthesizer({});

Choosing a Synthesizer

Synthesizer	Speed	Quality	Cost	Best For
Compact	Fast	Good	Low	General Q&A
Refine	Slow	Best	High	Complex queries
Tree Summarize	Medium	Good	Medium	Summarization
Multi-Modal	Fast	Good	Low	Images + text

Best Practices

Prompt Engineering:

Customize prompts for your domain
Include examples in prompts for better results
Test prompts with different synthesizers

Performance:

Use compact for most cases (good balance)
Use tree_summarize when you have many chunks
Avoid refine unless you need maximum quality

Context Management:

Retrieve more nodes than needed, let synthesizer select best ones
Use postprocessors before synthesis to filter nodes
Monitor token usage to avoid context window issues

Next Steps

Postprocessors

Filter and rerank nodes before synthesis

Evaluation

Measure and improve response quality

Getting Started

Core Concepts

Building with LlamaIndex

Data Management

Models & Embeddings

Retrievers & Indices

Advanced Features

Overview

Synthesis Modes

Compact (Default)

Refine

Tree Summarize

Factory Function

Streaming Responses

Custom Prompts

Using with Query Engines

Custom Synthesizers

Choosing a Synthesizer

Best Practices

Next Steps

Postprocessors

Evaluation

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Building with LlamaIndex

Data Management

Models & Embeddings

Retrievers & Indices

Advanced Features

​Overview

​Synthesis Modes

​Compact (Default)

​Refine

​Tree Summarize

​Multi-Modal

​Factory Function

​Streaming Responses

​Custom Prompts

​Using with Query Engines

​Custom Synthesizers

​Choosing a Synthesizer

​Best Practices

​Next Steps

Postprocessors

Evaluation

Build docs developers (and LLMs) love

Overview

Synthesis Modes

Compact (Default)

Refine

Tree Summarize

Multi-Modal

Factory Function

Streaming Responses

Custom Prompts

Using with Query Engines

Custom Synthesizers

Choosing a Synthesizer

Best Practices

Next Steps