Configuration

Overview

The LLM Gateway server is highly configurable through the createApp factory function. This guide covers advanced configuration options for harnesses, tools, models, and skills.

Application Configuration

The createApp function accepts an optional AppConfig object:

interface AppConfig {
  harness?: GeneratorHarnessModule;
  providerHarness?: GeneratorHarnessModule;
  tools?: ToolDefinition[];
  defaultModel?: string;
  skillDirs?: string[];
}

Harness Configuration

Provider Harnesses

Provider harnesses handle communication with AI providers. The gateway includes built-in support for multiple providers:

Zen (Default)
Anthropic
OpenAI
OpenRouter

import { createGeneratorHarness } from "./packages/ai/harness/providers/zen";

const app = await createApp({
  harness: createAgentHarness({
    harness: createGeneratorHarness()
  })
});

Requires ZEN_API_KEY in environment.

import { createAnthropicHarness } from "./packages/ai/harness/providers/anthropic";

const app = await createApp({
  harness: createAgentHarness({
    harness: createAnthropicHarness()
  })
});

Requires ANTHROPIC_API_KEY in environment.

import { createOpenAIHarness } from "./packages/ai/harness/providers/openai";

const app = await createApp({
  harness: createAgentHarness({
    harness: createOpenAIHarness()
  })
});

Requires OPENAI_API_KEY in environment.

import { createOpenRouterHarness } from "./packages/ai/harness/providers/openrouter";

const app = await createApp({
  harness: createAgentHarness({
    harness: createOpenRouterHarness()
  })
});

Requires OPENROUTER_API_KEY in environment.

Agent Harness

The agent harness wraps a provider harness with tool-calling capabilities:

import { createAgentHarness } from "./packages/ai/harness/agent";
import { createGeneratorHarness } from "./packages/ai/harness/providers/zen";

const agent = createAgentHarness({
  harness: createGeneratorHarness(),
  // Optional: additional agent configuration
});

const app = await createApp({ harness: agent });

The agent harness handles tool execution, permissions, and the agentic loop automatically.

Recursive Language Model (RLM)

For processing long inputs, use the RLM harness:

import { createRlmHarness } from "./packages/ai/rlm/harness";
import { createGeneratorHarness } from "./packages/ai/harness/providers/zen";

const rlm = createRlmHarness({
  rootHarness: createGeneratorHarness(),
  subHarness: createGeneratorHarness(), // Can use cheaper model
  config: {
    maxIterations: 10,
    maxStdoutLength: 4000,
    metadataPrefixLength: 200,
    maxDepth: 2,
  },
});

const app = await createApp({ harness: rlm });

Clients must specify mode: "rlm" in chat requests to use RLM mode.

Tool Configuration

Built-in Tools

The server includes four default tools:

bash

Execute shell commands with permission controls

agent

Spawn subagents for delegated tasks

read

Read file contents from the filesystem

patch

Apply code patches to files

Custom Tools

Define custom tools by implementing the ToolDefinition interface:

import type { ToolDefinition } from "./packages/ai/types";

const myCustomTool: ToolDefinition = {
  name: "my_tool",
  description: "Does something useful",
  schema: {
    type: "object",
    properties: {
      input: { type: "string", description: "Input parameter" },
    },
    required: ["input"],
  },
  execute: async (params) => {
    // Tool implementation
    return { success: true, output: "result" };
  },
  derivePermission: (params) => {
    // Return permission pattern for "always allow"
    return { tool: "my_tool", args: { input: params.input } };
  },
};

const app = await createApp({
  tools: [myCustomTool, bashTool, agentTool, readTool, patchTool],
});

Always implement derivePermission to enable “always allow” functionality for your tool.

Tool Permissions

Clients can control tool execution through permission configurations:

// Client request example
await fetch('/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'glm-4.7',
    messages: [...],
    permissions: {
      allowlist: [
        { tool: 'bash', args: { command: 'ls *' } },
        { tool: 'read' },
      ],
      deny: [
        { tool: 'bash', args: { command: 'rm *' } },
      ],
    },
  }),
});

Permission patterns use glob matching (via picomatch):

allowlist: Auto-approve matching tools
allowOnce: One-time approval, consumed on use
deny: Immediately reject matching tools

Model Configuration

Default Model

Specify a default model for requests that don’t include a model:

const app = await createApp({
  defaultModel: process.env.DEFAULT_MODEL || "glm-4.7",
});

The /models endpoint returns the default model if it’s supported by the configured provider:

{
  "models": ["glm-4.7", "kimi-k2.5", "..."]
  "defaultModel": "glm-4.7"
}

Model Validation

The server validates models against the provider’s supported models:

const models = await harness.supportedModels();
const validDefault = defaultModel && models.includes(defaultModel) 
  ? defaultModel 
  : undefined;

Skills Configuration

Skills extend agent capabilities with specialized instructions and workflows:

const app = await createApp({
  skillDirs: [
    "./skills",
    "/etc/llm-gateway/skills",
    process.env.CUSTOM_SKILLS_DIR,
  ].filter(Boolean),
});

Skills Discovery

The server automatically discovers skills in configured directories:

Searches each skillDir for valid skill definitions
Formats skills into a system prompt
Prepends skills prompt to agent messages

Skills are discovered at server startup. Restart the server to load new skills.

Request Configuration

Clients can configure individual chat requests:

Standard Mode

interface ChatRequest {
  model: string;              // Required: model identifier
  messages: Message[];        // Required: conversation history
  context?: string;           // Optional: additional context
  permissions?: Permissions;  // Optional: tool permissions
  mode?: "agent";            // Optional: defaults to agent
}

RLM Mode

interface ChatRequest {
  model: string;
  messages: Message[];
  context?: string;           // Long-form content to process
  mode: "rlm";               // Required for RLM
  maxIterations?: number;     // Default: 10
  maxDepth?: number;          // Default: 2
}

Agent Mode
RLM Mode

{
  "model": "glm-4.7",
  "messages": [
    { "role": "user", "content": "List files in current directory" }
  ],
  "permissions": {
    "allowlist": [{ "tool": "bash" }]
  }
}

{
  "model": "kimi-k2.5",
  "messages": [
    { "role": "user", "content": "Summarize this document" }
  ],
  "context": "<very long document content>",
  "mode": "rlm",
  "maxIterations": 15,
  "maxDepth": 3
}

Server Options

Bun server options can be configured in the export:

export default {
  port: Number(process.env.PORT) || 4000,
  fetch: app.fetch,
  idleTimeout: 255,  // Seconds before idle connections close
  // Additional Bun server options:
  // maxRequestBodySize: 1024 * 1024 * 10, // 10MB
  // development: process.env.NODE_ENV !== 'production',
};

See Bun server documentation for all available options.

Multi-Provider Setup

Support multiple providers by creating separate harness configurations:

import { createGeneratorHarness as createZen } from "./packages/ai/harness/providers/zen";
import { createAnthropicHarness } from "./packages/ai/harness/providers/anthropic";

// Route requests based on model prefix or header
const getHarness = (model: string) => {
  if (model.startsWith('claude-')) {
    return createAgentHarness({ harness: createAnthropicHarness() });
  }
  return createAgentHarness({ harness: createZen() });
};

// This requires modifying the server to support dynamic harness selection

The default server uses a single harness. Multi-provider support requires custom server modifications.

Performance Tuning

Connection Limits

Manage concurrent connections based on your infrastructure:

// Implement connection limiting middleware
const connectionLimit = 100;
let activeConnections = 0;

app.use(async (c, next) => {
  if (activeConnections >= connectionLimit) {
    return c.json({ error: 'Server at capacity' }, 503);
  }
  activeConnections++;
  try {
    await next();
  } finally {
    activeConnections--;
  }
});

Memory Management

Orchestrators are automatically cleaned up, but monitor memory for long-running sessions:

# Monitor memory usage
bun --inspect server/index.ts

# Set Node.js memory limits if needed
NODE_OPTIONS="--max-old-space-size=4096" bun run server/index.ts

Configuration Best Practices

Environment-based Configuration

Use environment variables for deployment-specific settings

Sensible Defaults

Provide reasonable defaults for optional configuration

Validation

Validate configuration at startup to fail fast

Documentation

Document custom tools and skills for your team

Next Steps

Environment Variables

Complete reference for all environment variables

HTTP API Reference

Detailed endpoint documentation

Get Started

Core Concepts

Guides

Building Extensions

Deployment

Overview

Application Configuration

Harness Configuration

Provider Harnesses

Agent Harness

Recursive Language Model (RLM)

Tool Configuration

Built-in Tools

bash

agent

read

patch

Custom Tools

Tool Permissions

Model Configuration

Default Model

Model Validation

Skills Configuration

Skills Discovery

Request Configuration

Standard Mode

RLM Mode

Server Options

Multi-Provider Setup

Performance Tuning

Connection Limits

Memory Management

Configuration Best Practices

Next Steps

Environment Variables

HTTP API Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Building Extensions

Deployment

​Overview

​Application Configuration

​Harness Configuration

​Provider Harnesses

​Agent Harness

​Recursive Language Model (RLM)

​Tool Configuration

​Built-in Tools

bash

agent

read

patch

​Custom Tools

​Tool Permissions

​Model Configuration

​Default Model

​Model Validation

​Skills Configuration

​Skills Discovery

​Request Configuration

​Standard Mode

​RLM Mode

​Server Options

​Multi-Provider Setup

​Performance Tuning

​Connection Limits

​Memory Management

​Configuration Best Practices

​Next Steps

Environment Variables

HTTP API Reference

Build docs developers (and LLMs) love

Overview

Application Configuration

Harness Configuration

Provider Harnesses

Agent Harness

Recursive Language Model (RLM)

Tool Configuration

Built-in Tools

Custom Tools

Tool Permissions

Model Configuration

Default Model

Model Validation

Skills Configuration

Skills Discovery

Request Configuration

Standard Mode

RLM Mode

Server Options

Multi-Provider Setup

Performance Tuning

Connection Limits

Memory Management

Configuration Best Practices

Next Steps