Skip to main content

Overview

LLM Gateway provides streaming LLM calls through provider harnesses. Each harness implements a simple async generator interface that yields events as tokens arrive from the API.

Quick Start

1

Choose a Provider

Select from available providers: zen (OpenAI-compatible), anthropic, openai, or openrouter.
import { createGeneratorHarness } from "./packages/ai/harness/providers/zen";

const harness = createGeneratorHarness();
2

Invoke the Harness

Call invoke() with your model and messages:
for await (const event of harness.invoke({
  model: "glm-4.7",
  messages: [{ role: "user", content: "What is the sum of the first 10 primes?" }],
})) {
  if (event.type === "text") {
    process.stdout.write(event.content);
  }
}
3

Handle Events

Process different event types as they stream:
for await (const event of harness.invoke(params)) {
  switch (event.type) {
    case "harness_start":
      console.log("Stream started");
      break;
    case "text":
      process.stdout.write(event.content);
      break;
    case "reasoning":
      process.stderr.write(event.content);
      break;
    case "usage":
      console.log(`Tokens: ${event.inputTokens} in, ${event.outputTokens} out`);
      break;
    case "harness_end":
      console.log("\nStream complete");
      break;
  }
}

Event Types

Provider harnesses yield these events:
EventDescriptionFields
harness_startStream beginsrunId
textStreamed text tokenid, runId, content
reasoningStreamed reasoning tokenid, runId, content
tool_callModel requested toolid, runId, name, input
usageToken usage statsrunId, inputTokens, outputTokens
errorError occurredrunId, message
harness_endStream completerunId

Provider-Specific Configuration

The Zen provider works with OpenAI-compatible APIs and supports reasoning content:
import { createGeneratorHarness } from "./packages/ai/harness/providers/zen";

const harness = createGeneratorHarness({
  apiKey: process.env.ZEN_API_KEY,
  baseUrl: process.env.ZEN_BASE_URL,
});

for await (const event of harness.invoke({
  model: "glm-4.7",
  messages: [{ role: "user", content: "Explain quantum computing" }],
})) {
  if (event.type === "reasoning") {
    // Model's internal thinking
    console.error("[thinking]", event.content);
  }
  if (event.type === "text") {
    // Final output
    process.stdout.write(event.content);
  }
}

Reasoning vs Text

Models that support extended thinking (like DeepSeek, o1) emit separate streams:
  • reasoning events: Internal model thinking process (not part of final answer)
  • text events: Final output tokens
for await (const event of harness.invoke({
  model: "deepseek-reasoner",
  messages: [{ role: "user", content: "Solve this: 2x + 5 = 15" }],
})) {
  if (event.type === "reasoning") {
    // "Let me work through this step by step..."
    console.error("💭", event.content);
  }
  if (event.type === "text") {
    // "x = 5"
    console.log("✨", event.content);
  }
}

Message History

Build multi-turn conversations by accumulating messages:
const messages = [
  { role: "user" as const, content: "What is TypeScript?" }
];

let response = "";

for await (const event of harness.invoke({ model: "glm-4.7", messages })) {
  if (event.type === "text") {
    response += event.content;
    process.stdout.write(event.content);
  }
}

// Add assistant response to history
messages.push({ role: "assistant", content: response });

// Follow-up question
messages.push({ role: "user", content: "How does it compare to JavaScript?" });

for await (const event of harness.invoke({ model: "glm-4.7", messages })) {
  if (event.type === "text") process.stdout.write(event.content);
}

System Prompts

Include system messages to set behavior:
for await (const event of harness.invoke({
  model: "glm-4.7",
  messages: [
    {
      role: "system",
      content: "You are a helpful assistant that responds in haiku format."
    },
    { role: "user", content: "Explain recursion" }
  ],
})) {
  if (event.type === "text") process.stdout.write(event.content);
}

Error Handling

Handle errors gracefully:
try {
  for await (const event of harness.invoke({
    model: "glm-4.7",
    messages: [{ role: "user", content: "Hello" }],
  })) {
    if (event.type === "error") {
      console.error("Stream error:", event.message);
      break;
    }
    if (event.type === "text") {
      process.stdout.write(event.content);
    }
  }
} catch (error) {
  console.error("Fatal error:", error);
}

Tracking Token Usage

Accumulate token counts across the stream:
let totalInputTokens = 0;
let totalOutputTokens = 0;

for await (const event of harness.invoke({
  model: "glm-4.7",
  messages: [{ role: "user", content: "Write a story" }],
})) {
  if (event.type === "usage") {
    totalInputTokens += event.inputTokens || 0;
    totalOutputTokens += event.outputTokens || 0;
  }
  if (event.type === "text") {
    process.stdout.write(event.content);
  }
}

console.log(`\nTotal tokens: ${totalInputTokens} in, ${totalOutputTokens} out`);

Run IDs and Provenance

Every event carries a runId that identifies the LLM invocation:
for await (const event of harness.invoke(params)) {
  console.log(`[${event.runId}] ${event.type}`);

  if (event.parentId) {
    console.log(`  ↳ spawned by ${event.parentId}`);
  }
}
This becomes important when composing harnesses — each nested call gets its own runId, and child runs include parentId to preserve the call graph.

Composition

Provider harnesses compose with other harnesses. See the Tool Calling and Multi-Agent guides for wrapping providers with agentic capabilities.
import { createAgentHarness } from "./packages/ai/harness/agent";
import { createGeneratorHarness } from "./packages/ai/harness/providers/zen";

const provider = createGeneratorHarness();
const agent = createAgentHarness({ harness: provider });

// agent.invoke() now includes tool execution

Next Steps

Tool Calling

Add tools to let the model execute actions

Multi-Agent

Orchestrate multiple concurrent agents

Build docs developers (and LLMs) love