Overview
The RLM (Recursive Language Model) harness treats LLM input as a REPL environment variable, not direct context. The model writes JavaScript to examine, chunk, and recursively process arbitrarily long inputs. It also provides exec() for running shell commands, making RLM a general-purpose “model writes code to solve problems” harness.
Import
import { createRlmHarness } from "@llm-gateway/ai/rlm/harness";
Function Signature
function createRlmHarness(
options: RlmHarnessOptions
): GeneratorHarnessModule
Parameters
options
RlmHarnessOptions
required
Configuration for the RLM harnessoptions.rootHarness
GeneratorHarnessModule
required
Provider harness for root LLM calls
Provider harness for sub LLM calls via llm_query(). Defaults to rootHarness
RLM configuration objectMaximum number of REPL execution loops
Maximum stdout length before truncation
config.metadataPrefixLength
Length of context prefix to show in system prompt
Default timeout for exec() calls in seconds (default: 10)
Working directory for exec() calls
Model to use for llm_query() calls
Character limit for llm_query() prompts (default: 10000)
Maximum recursion depth for nested RLM calls (default: 2)
Returns
A harness module with invoke() and supportedModels() methods
How It Works
- Extract user prompt from messages, create REPL with prompt as
context
- Build system prompt with metadata only (length, prefix) — model never sees full input
- Each iteration:
- Stream LLM response (yields
text/reasoning/usage)
- Extract code from fenced block (exactly one per turn)
- Execute in REPL
- Yield
repl_input/repl_progress/repl_output events
- Append stdout/error to message history
- If
FINAL() called → yield final text event, break
- Yield
harness_end
Basic Example
import { createRlmHarness } from "@llm-gateway/ai/rlm/harness";
import { createGeneratorHarness } from "@llm-gateway/ai/harness/providers/zen";
const rlm = createRlmHarness({
rootHarness: createGeneratorHarness(),
config: {
maxIterations: 10,
maxStdoutLength: 4000,
metadataPrefixLength: 200,
},
});
// Process a long document
const longDocument = await fs.readFile("large-file.txt", "utf-8");
for await (const event of rlm.invoke({
model: "claude-sonnet-4-20250514",
context: longDocument, // Document as context, not in messages
messages: [{ role: "user", content: "Summarize the key points" }],
})) {
if (event.type === "repl_progress") {
console.log(event.chunk);
}
if (event.type === "text") {
console.log("Final answer:", event.content);
}
}
REPL Functions
The model can use these functions in its code:
context
The input data as a string variable:
// Model writes:
console.log(context.length);
const lines = context.split("\n");
FINAL(answer)
Signals completion and returns the final answer:
// Model writes:
FINAL("The document contains 3 main themes...");
llm_query(prompt, context?)
Make a sub-LLM call to process data:
// Model writes:
const chunk = context.slice(0, 10000);
const summary = await llm_query(
"Summarize the key points",
chunk
);
console.log(summary);
exec(command, timeout?)
Run shell commands:
// Model writes:
const result = await exec("grep 'error' logfile.txt");
console.log(result.stdout);
Events Yielded
harness_start
Loop begins:
{
type: "harness_start",
runId: string,
depth?: number, // Recursion depth if > 0
maxIterations?: number,
}
harness_end
Loop completes:
{
type: "harness_end",
runId: string,
reason?: "final" | "max_iterations",
iterations?: number,
totalUsage?: { inputTokens: number, outputTokens: number },
}
Code about to execute:
{
type: "repl_input",
runId: string,
id: string,
code: string,
iteration?: number, // Zero-based loop index
}
repl_progress
Live output during execution:
{
type: "repl_progress",
runId: string,
id: string,
chunk: string,
stream: "stdout" | "stderr",
}
repl_output
Execution result:
{
type: "repl_output",
runId: string,
id: string,
stdout: string,
error?: string,
done: boolean, // True if FINAL() was called
iteration?: number,
durationMs?: number,
truncated?: boolean, // True if stdout was truncated
}
text, reasoning, usage
Passed through from provider harness:
{
type: "text",
runId: string,
id: string,
content: string,
}
relay (permission)
Permission required for exec():
{
type: "relay",
kind: "permission",
runId: string,
id: string,
toolCallId: string,
tool: "exec",
params: { command: string },
respond: (response: PermissionResponse) => void,
}
Permission Control
const permissions = {
allowlist: [
{ tool: "exec", params: { command: "ls*" } },
{ tool: "exec", params: { command: "cat*" } },
],
};
for await (const event of rlm.invoke({
model: "claude-sonnet-4-20250514",
context: largeDataset,
messages: [{ role: "user", content: "Analyze the data" }],
permissions,
})) {
if (event.type === "relay" && event.kind === "permission") {
const approved = await getUserApproval(event.params.command);
event.respond({ approved });
}
}
Recursive RLM
RLM can spawn child RLM sessions via llm_query() at depth > 0:
const rlm = createRlmHarness({
rootHarness: createGeneratorHarness(),
config: {
maxIterations: 10,
maxStdoutLength: 4000,
metadataPrefixLength: 200,
maxDepth: 2, // Enable 2 levels of recursion
},
});
// Model can write:
// const summary = await llm_query("Summarize", hugeChunk);
// This spawns a child RLM session with its own REPL
Processing Chunks
for await (const event of rlm.invoke({
model: "claude-sonnet-4-20250514",
context: megabyteDocument,
messages: [{ role: "user", content: "Extract all dates mentioned" }],
})) {
if (event.type === "repl_progress") {
// Model might write:
// const chunks = [];
// for (let i = 0; i < context.length; i += 10000) {
// const chunk = context.slice(i, i + 10000);
// const dates = await llm_query("Extract dates", chunk);
// chunks.push(dates);
// }
console.log(event.chunk);
}
}
With Shell Commands
const rlm = createRlmHarness({
rootHarness: createGeneratorHarness(),
config: {
maxIterations: 10,
maxStdoutLength: 4000,
metadataPrefixLength: 200,
execTimeout: 30, // 30 second timeout
execCwd: "/path/to/project",
},
});
for await (const event of rlm.invoke({
model: "claude-sonnet-4-20250514",
messages: [{ role: "user", content: "Find all TypeScript files with TODO comments" }],
})) {
// Model might write:
// const result = await exec("find . -name '*.ts' -exec grep -l 'TODO' {} \\;");
// const files = result.stdout.split("\n").filter(Boolean);
// FINAL(`Found ${files.length} files with TODOs`);
if (event.type === "repl_output") {
console.log("Command output:", event.stdout);
}
}
Two-Harness Pattern
const rlm = createRlmHarness({
rootHarness: createGeneratorHarness({ model: "claude-sonnet-4-20250514" }),
subHarness: createGeneratorHarness({ model: "claude-3-5-haiku-20241022" }), // Faster, cheaper for sub-calls
config: {
maxIterations: 10,
maxStdoutLength: 4000,
metadataPrefixLength: 200,
subModel: "claude-3-5-haiku-20241022",
},
});
Architecture
RLM wraps any provider harness and runs an inference loop:
- Model receives metadata about input (length, prefix)
- Model writes code to explore data through sandboxed REPL
- Iterates until
FINAL() or maxIterations
Key features:
- Model never sees full input in context
- Arbitrary length processing through chunking
- Recursive sub-queries for complex tasks
- Shell command execution for system integration
- Persistent scope across iterations