LlmResponse

Overview

The LlmResponse class represents the response from an LLM generation request. It contains the generated content, usage metadata, grounding information, error details, and streaming state.

Class Definition

class LlmResponse {
  id?: string;
  text?: string;
  content?: Content;
  groundingMetadata?: GroundingMetadata;
  partial?: boolean;
  turnComplete?: boolean;
  errorCode?: string;
  errorMessage?: string;
  interrupted?: boolean;
  customMetadata?: Record<string, any>;
  cacheMetadata?: CacheMetadata;
  usageMetadata?: GenerateContentResponseUsageMetadata;
  candidateIndex?: number;
  finishReason?: string;
  error?: Error;

  constructor(data?: Partial<LlmResponse>);
  static create(generateContentResponse: GenerateContentResponse): LlmResponse;
  static fromError(error: unknown, options?: { errorCode?: string; model?: string }): LlmResponse;
}

Properties

string

Unique identifier for the response.

text

string

Plain text content of the response. Convenient accessor for simple text responses.

console.log(response.text); // "Hello, how can I help?"

content

Content

Structured content with role and parts. Contains the full response including text, function calls, and other parts.

{
  role: "model",
  parts: [
    { text: "The weather is sunny" },
    { functionCall: { name: "get_weather", args: {...} } }
  ]
}

groundingMetadata

GroundingMetadata

Metadata about grounding sources when using grounded generation. Contains citations and search results.

partial

boolean

Indicates if this is a partial response in a streaming context. true means more chunks are expected.

turnComplete

boolean

Indicates if the conversation turn is complete. Used in bidirectional streaming.

errorCode

string

Error code if the response encountered an error. Common values:

"STOP" - Normal completion
"MAX_TOKENS" - Hit token limit
"SAFETY" - Blocked by safety filters
"RECITATION" - Blocked due to recitation
"OTHER" - Other error
"UNKNOWN_ERROR" - Unknown error

errorMessage

string

Human-readable error message providing details about any error that occurred.

interrupted

boolean

Indicates if the generation was interrupted (e.g., by user or timeout).

customMetadata

Record<string, any>

Custom metadata that implementations can attach to responses.

cacheMetadata

CacheMetadata

Metadata about context caching for this request/response.

usageMetadata

GenerateContentResponseUsageMetadata

Token usage information for billing and monitoring.

Show Usage Metadata Properties

promptTokenCount

number

Number of tokens in the input prompt.

candidatesTokenCount

number

Number of tokens in the generated response.

totalTokenCount

number

Total tokens (prompt + candidates).

cachedContentTokenCount

number

Number of tokens loaded from cache.

console.log(`Used ${response.usageMetadata?.totalTokenCount} tokens`);

candidateIndex

number

Index of the candidate response when multiple candidates are requested.

finishReason

string

Reason why generation finished:

"STOP" - Natural completion
"MAX_TOKENS" - Reached token limit
"SAFETY" - Safety filter triggered
"RECITATION" - Recitation detected
"OTHER" - Other reason

error

Error

Original Error object if the response resulted from an exception.

Constructor

data

Partial<LlmResponse>

Optional initialization data for any properties.

const response = new LlmResponse({
  text: "Hello!",
  content: {
    role: "model",
    parts: [{ text: "Hello!" }],
  },
  finishReason: "STOP",
});

Static Methods

create()

Creates an LlmResponse from a standard GenerateContentResponse object. Handles various response scenarios including successful generations, errors, and prompt feedback.

generateContentResponse

GenerateContentResponse

required

The raw response from the LLM API.

Returns: LlmResponse

const llmResponse = LlmResponse.create(apiResponse);

Behavior:

Extracts candidate content and metadata when available
Maps finish reasons to error codes when generation failed
Captures prompt feedback for safety/policy violations
Always preserves usage metadata
Returns error response for unknown failures

fromError()

Creates an error LlmResponse from an exception, useful for consistent error handling across different LLM providers.

error

unknown

required

The error or exception that occurred.

options

object

Optional error configuration.

Show Options Properties

errorCode

string

Custom error code. Defaults to "UNKNOWN_ERROR".

model

string

Model name for error message context.

Returns: LlmResponse

try {
  const response = await callLlm();
} catch (error) {
  const errorResponse = LlmResponse.fromError(error, {
    errorCode: "API_ERROR",
    model: "gpt-4",
  });
  console.error(errorResponse.errorMessage);
}

Response Structure:

Sets errorCode and errorMessage
Creates error content with error text
Sets finishReason to "STOP"
Preserves original Error object
Formats message as: "LLM call failed for model {model}: {errorMessage}"

Usage Examples

Handling Streaming Responses

import { BaseLlm, LlmRequest } from "@iqai/adk";

const llm = new MyLlm("gpt-4");
const request = new LlmRequest({ /* ... */ });

let fullText = "";
for await (const response of llm.generateContentAsync(request, true)) {
  if (response.text) {
    fullText += response.text;
    process.stdout.write(response.text);
  }

  if (response.errorCode) {
    console.error(`Error: ${response.errorMessage}`);
    break;
  }

  // Check if we've hit token limit
  if (response.finishReason === "MAX_TOKENS") {
    console.warn("Response truncated due to token limit");
  }
}

Monitoring Token Usage

const response = await getFirstResponse(llm, request);

if (response.usageMetadata) {
  const { promptTokenCount, candidatesTokenCount, totalTokenCount } =
    response.usageMetadata;

  console.log(`Input tokens: ${promptTokenCount}`);
  console.log(`Output tokens: ${candidatesTokenCount}`);
  console.log(`Total tokens: ${totalTokenCount}`);

  // Calculate cost (example rates)
  const inputCost = (promptTokenCount / 1000000) * 0.50;
  const outputCost = (candidatesTokenCount / 1000000) * 1.50;
  console.log(`Estimated cost: $${(inputCost + outputCost).toFixed(4)}`);
}

Handling Function Calls

for await (const response of llm.generateContentAsync(request)) {
  if (response.content?.parts) {
    for (const part of response.content.parts) {
      if (part.functionCall) {
        console.log(`Calling function: ${part.functionCall.name}`);
        console.log(`Arguments:`, part.functionCall.args);

        // Execute the function
        const tool = request.toolsDict[part.functionCall.name];
        const result = await tool.execute(part.functionCall.args);
        // Send result back to LLM...
      } else if (part.text) {
        console.log(`Text response: ${part.text}`);
      }
    }
  }
}

Error Handling Patterns

const response = await getResponse(llm, request);

// Check for errors
if (response.errorCode && response.errorCode !== "STOP") {
  switch (response.errorCode) {
    case "SAFETY":
      console.error("Response blocked by safety filters");
      break;
    case "MAX_TOKENS":
      console.warn("Response truncated - increase maxOutputTokens");
      break;
    case "RECITATION":
      console.error("Response blocked due to recitation");
      break;
    default:
      console.error(`Error: ${response.errorMessage}`);
  }
  return;
}

// Process successful response
if (response.text) {
  console.log(response.text);
}

Working with Grounding

for await (const response of llm.generateContentAsync(request)) {
  if (response.groundingMetadata) {
    console.log("Response includes grounded information:");

    // Access search results and citations
    const metadata = response.groundingMetadata;
    if (metadata.searchEntryPoint) {
      console.log(`Search query: ${metadata.searchEntryPoint.renderedContent}`);
    }

    if (metadata.groundingChunks) {
      console.log(`Sources: ${metadata.groundingChunks.length}`);
    }
  }

  console.log(response.text);
}

Response Aggregation for Streaming

When using streaming, aggregate partial responses:

function aggregateStreamingResponses(responses: LlmResponse[]): LlmResponse {
  const aggregated = new LlmResponse({
    content: { role: "model", parts: [] },
    text: "",
  });

  for (const response of responses) {
    // Merge text
    if (response.text) {
      aggregated.text = (aggregated.text || "") + response.text;
    }

    // Merge content parts
    if (response.content?.parts) {
      aggregated.content!.parts.push(...response.content.parts);
    }

    // Use latest metadata
    if (response.usageMetadata) {
      aggregated.usageMetadata = response.usageMetadata;
    }

    if (response.finishReason) {
      aggregated.finishReason = response.finishReason;
    }
  }

  return aggregated;
}

BaseLlm - Base class that generates LlmResponse
LlmRequest - Request configuration that produces responses
CacheMetadata - Cache metadata attached to responses

Type Imports

import type {
  Content,
  GenerateContentResponseUsageMetadata,
  GroundingMetadata,
} from "@google/genai";
import { CacheMetadata } from "@adk/models";

Source Reference

See implementation: /packages/adk/src/models/llm-response.ts

Core Classes

Models

Tools

Sessions & Memory

Events & Flows

Overview

Class Definition

Properties

Constructor

Static Methods

create()

fromError()

Usage Examples

Handling Streaming Responses

Monitoring Token Usage

Handling Function Calls

Error Handling Patterns

Working with Grounding

Response Aggregation for Streaming

Type Imports

Source Reference

Build docs developers (and LLMs) love

Core Classes

Models

Tools

Sessions & Memory

Events & Flows

​Overview

​Class Definition

​Properties

​Constructor

​Static Methods

​create()

​fromError()

​Usage Examples

​Handling Streaming Responses

​Monitoring Token Usage

​Handling Function Calls

​Error Handling Patterns

​Working with Grounding

​Response Aggregation for Streaming

​Related Types

​Type Imports

​Source Reference

Build docs developers (and LLMs) love

Overview

Class Definition

Properties

Constructor

Static Methods

create()

fromError()

Usage Examples

Handling Streaming Responses

Monitoring Token Usage

Handling Function Calls

Error Handling Patterns

Working with Grounding

Response Aggregation for Streaming

Related Types

Type Imports

Source Reference