RLM Harness

Overview

The RLM (Recursive Language Model) harness treats LLM input as a REPL environment variable, not direct context. The model writes JavaScript to examine, chunk, and recursively process arbitrarily long inputs. It also provides exec() for running shell commands, making RLM a general-purpose “model writes code to solve problems” harness.

Import

import { createRlmHarness } from "@llm-gateway/ai/rlm/harness";

Function Signature

function createRlmHarness(
  options: RlmHarnessOptions
): GeneratorHarnessModule

Parameters

options

RlmHarnessOptions

required

Configuration for the RLM harness

options.rootHarness

GeneratorHarnessModule

required

Provider harness for root LLM calls

options.subHarness

GeneratorHarnessModule

Provider harness for sub LLM calls via llm_query(). Defaults to rootHarness

options.config

RlmConfig

required

RLM configuration object

config.maxIterations

number

required

Maximum number of REPL execution loops

config.maxStdoutLength

number

required

Maximum stdout length before truncation

config.metadataPrefixLength

number

required

Length of context prefix to show in system prompt

config.execTimeout

number

Default timeout for exec() calls in seconds (default: 10)

config.execCwd

string

Working directory for exec() calls

config.subModel

string

Model to use for llm_query() calls

config.subPromptBudget

number

Character limit for llm_query() prompts (default: 10000)

config.maxDepth

number

Maximum recursion depth for nested RLM calls (default: 2)

Returns

GeneratorHarnessModule

object

A harness module with invoke() and supportedModels() methods

How It Works

Extract user prompt from messages, create REPL with prompt as context
Build system prompt with metadata only (length, prefix) — model never sees full input
Each iteration:
- Stream LLM response (yields text/reasoning/usage)
- Extract code from fenced block (exactly one per turn)
- Execute in REPL
- Yield repl_input/repl_progress/repl_output events
- Append stdout/error to message history
If FINAL() called → yield final text event, break
Yield harness_end

Basic Example

import { createRlmHarness } from "@llm-gateway/ai/rlm/harness";
import { createGeneratorHarness } from "@llm-gateway/ai/harness/providers/zen";

const rlm = createRlmHarness({
  rootHarness: createGeneratorHarness(),
  config: {
    maxIterations: 10,
    maxStdoutLength: 4000,
    metadataPrefixLength: 200,
  },
});

// Process a long document
const longDocument = await fs.readFile("large-file.txt", "utf-8");

for await (const event of rlm.invoke({
  model: "claude-sonnet-4-20250514",
  context: longDocument,  // Document as context, not in messages
  messages: [{ role: "user", content: "Summarize the key points" }],
})) {
  if (event.type === "repl_progress") {
    console.log(event.chunk);
  }
  if (event.type === "text") {
    console.log("Final answer:", event.content);
  }
}

REPL Functions

The model can use these functions in its code:

context

The input data as a string variable:

// Model writes:
console.log(context.length);
const lines = context.split("\n");

FINAL(answer)

Signals completion and returns the final answer:

// Model writes:
FINAL("The document contains 3 main themes...");

llm_query(prompt, context?)

Make a sub-LLM call to process data:

// Model writes:
const chunk = context.slice(0, 10000);
const summary = await llm_query(
  "Summarize the key points",
  chunk
);
console.log(summary);

exec(command, timeout?)

Run shell commands:

// Model writes:
const result = await exec("grep 'error' logfile.txt");
console.log(result.stdout);

Events Yielded

harness_start

Loop begins:

{
  type: "harness_start",
  runId: string,
  depth?: number,              // Recursion depth if > 0
  maxIterations?: number,
}

harness_end

Loop completes:

{
  type: "harness_end",
  runId: string,
  reason?: "final" | "max_iterations",
  iterations?: number,
  totalUsage?: { inputTokens: number, outputTokens: number },
}

repl_input

Code about to execute:

{
  type: "repl_input",
  runId: string,
  id: string,
  code: string,
  iteration?: number,  // Zero-based loop index
}

repl_progress

Live output during execution:

{
  type: "repl_progress",
  runId: string,
  id: string,
  chunk: string,
  stream: "stdout" | "stderr",
}

repl_output

Execution result:

{
  type: "repl_output",
  runId: string,
  id: string,
  stdout: string,
  error?: string,
  done: boolean,           // True if FINAL() was called
  iteration?: number,
  durationMs?: number,
  truncated?: boolean,     // True if stdout was truncated
}

text, reasoning, usage

Passed through from provider harness:

{
  type: "text",
  runId: string,
  id: string,
  content: string,
}

relay (permission)

Permission required for exec():

{
  type: "relay",
  kind: "permission",
  runId: string,
  id: string,
  toolCallId: string,
  tool: "exec",
  params: { command: string },
  respond: (response: PermissionResponse) => void,
}

Permission Control

const permissions = {
  allowlist: [
    { tool: "exec", params: { command: "ls*" } },
    { tool: "exec", params: { command: "cat*" } },
  ],
};

for await (const event of rlm.invoke({
  model: "claude-sonnet-4-20250514",
  context: largeDataset,
  messages: [{ role: "user", content: "Analyze the data" }],
  permissions,
})) {
  if (event.type === "relay" && event.kind === "permission") {
    const approved = await getUserApproval(event.params.command);
    event.respond({ approved });
  }
}

Recursive RLM

RLM can spawn child RLM sessions via llm_query() at depth > 0:

const rlm = createRlmHarness({
  rootHarness: createGeneratorHarness(),
  config: {
    maxIterations: 10,
    maxStdoutLength: 4000,
    metadataPrefixLength: 200,
    maxDepth: 2,  // Enable 2 levels of recursion
  },
});

// Model can write:
// const summary = await llm_query("Summarize", hugeChunk);
// This spawns a child RLM session with its own REPL

Processing Chunks

for await (const event of rlm.invoke({
  model: "claude-sonnet-4-20250514",
  context: megabyteDocument,
  messages: [{ role: "user", content: "Extract all dates mentioned" }],
})) {
  if (event.type === "repl_progress") {
    // Model might write:
    // const chunks = [];
    // for (let i = 0; i < context.length; i += 10000) {
    //   const chunk = context.slice(i, i + 10000);
    //   const dates = await llm_query("Extract dates", chunk);
    //   chunks.push(dates);
    // }
    console.log(event.chunk);
  }
}

With Shell Commands

const rlm = createRlmHarness({
  rootHarness: createGeneratorHarness(),
  config: {
    maxIterations: 10,
    maxStdoutLength: 4000,
    metadataPrefixLength: 200,
    execTimeout: 30,  // 30 second timeout
    execCwd: "/path/to/project",
  },
});

for await (const event of rlm.invoke({
  model: "claude-sonnet-4-20250514",
  messages: [{ role: "user", content: "Find all TypeScript files with TODO comments" }],
})) {
  // Model might write:
  // const result = await exec("find . -name '*.ts' -exec grep -l 'TODO' {} \\;");
  // const files = result.stdout.split("\n").filter(Boolean);
  // FINAL(`Found ${files.length} files with TODOs`);
  
  if (event.type === "repl_output") {
    console.log("Command output:", event.stdout);
  }
}

Two-Harness Pattern

const rlm = createRlmHarness({
  rootHarness: createGeneratorHarness({ model: "claude-sonnet-4-20250514" }),
  subHarness: createGeneratorHarness({ model: "claude-3-5-haiku-20241022" }),  // Faster, cheaper for sub-calls
  config: {
    maxIterations: 10,
    maxStdoutLength: 4000,
    metadataPrefixLength: 200,
    subModel: "claude-3-5-haiku-20241022",
  },
});

Architecture

RLM wraps any provider harness and runs an inference loop:

Model receives metadata about input (length, prefix)
Model writes code to explore data through sandboxed REPL
Iterates until FINAL() or maxIterations

Key features:

Model never sees full input in context
Arbitrary length processing through chunking
Recursive sub-queries for complex tasks
Shell command execution for system integration
Persistent scope across iterations

Agent Harness - Tool execution and permissions
Zen Harness - Default provider
Anthropic Harness - Alternative provider

Harnesses

Orchestration

Tools

Client Library

Types & Primitives

HTTP API

Overview

Import

Function Signature

Parameters

Returns

How It Works

Basic Example

REPL Functions

context

FINAL(answer)

llm_query(prompt, context?)

exec(command, timeout?)

Events Yielded

harness_start

harness_end

repl_input

repl_progress

repl_output

text, reasoning, usage

relay (permission)

Permission Control

Recursive RLM

Processing Chunks

With Shell Commands

Two-Harness Pattern

Architecture

Build docs developers (and LLMs) love

Harnesses

Orchestration

Tools

Client Library

Types & Primitives

HTTP API

​Overview

​Import

​Function Signature

​Parameters

​Returns

​How It Works

​Basic Example

​REPL Functions

​context

​FINAL(answer)

​llm_query(prompt, context?)

​exec(command, timeout?)

​Events Yielded

​harness_start

​harness_end

​repl_input

​repl_progress

​repl_output

​text, reasoning, usage

​relay (permission)

​Permission Control

​Recursive RLM

​Processing Chunks

​With Shell Commands

​Two-Harness Pattern

​Architecture

​Related

Build docs developers (and LLMs) love

Overview

Import

Function Signature

Parameters

Returns

How It Works

Basic Example

REPL Functions

context

FINAL(answer)

llm_query(prompt, context?)

exec(command, timeout?)

Events Yielded

harness_start

harness_end

repl_input

repl_progress

repl_output

text, reasoning, usage

relay (permission)

Permission Control

Recursive RLM

Processing Chunks

With Shell Commands

Two-Harness Pattern

Architecture

Related