Skip to main content

Overview

The LLM Gateway server is highly configurable through the createApp factory function. This guide covers advanced configuration options for harnesses, tools, models, and skills.

Application Configuration

The createApp function accepts an optional AppConfig object:
interface AppConfig {
  harness?: GeneratorHarnessModule;
  providerHarness?: GeneratorHarnessModule;
  tools?: ToolDefinition[];
  defaultModel?: string;
  skillDirs?: string[];
}

Harness Configuration

Provider Harnesses

Provider harnesses handle communication with AI providers. The gateway includes built-in support for multiple providers:
import { createGeneratorHarness } from "./packages/ai/harness/providers/zen";

const app = await createApp({
  harness: createAgentHarness({
    harness: createGeneratorHarness()
  })
});
Requires ZEN_API_KEY in environment.

Agent Harness

The agent harness wraps a provider harness with tool-calling capabilities:
import { createAgentHarness } from "./packages/ai/harness/agent";
import { createGeneratorHarness } from "./packages/ai/harness/providers/zen";

const agent = createAgentHarness({
  harness: createGeneratorHarness(),
  // Optional: additional agent configuration
});

const app = await createApp({ harness: agent });
The agent harness handles tool execution, permissions, and the agentic loop automatically.

Recursive Language Model (RLM)

For processing long inputs, use the RLM harness:
import { createRlmHarness } from "./packages/ai/rlm/harness";
import { createGeneratorHarness } from "./packages/ai/harness/providers/zen";

const rlm = createRlmHarness({
  rootHarness: createGeneratorHarness(),
  subHarness: createGeneratorHarness(), // Can use cheaper model
  config: {
    maxIterations: 10,
    maxStdoutLength: 4000,
    metadataPrefixLength: 200,
    maxDepth: 2,
  },
});

const app = await createApp({ harness: rlm });
Clients must specify mode: "rlm" in chat requests to use RLM mode.

Tool Configuration

Built-in Tools

The server includes four default tools:

bash

Execute shell commands with permission controls

agent

Spawn subagents for delegated tasks

read

Read file contents from the filesystem

patch

Apply code patches to files

Custom Tools

Define custom tools by implementing the ToolDefinition interface:
import type { ToolDefinition } from "./packages/ai/types";

const myCustomTool: ToolDefinition = {
  name: "my_tool",
  description: "Does something useful",
  schema: {
    type: "object",
    properties: {
      input: { type: "string", description: "Input parameter" },
    },
    required: ["input"],
  },
  execute: async (params) => {
    // Tool implementation
    return { success: true, output: "result" };
  },
  derivePermission: (params) => {
    // Return permission pattern for "always allow"
    return { tool: "my_tool", args: { input: params.input } };
  },
};

const app = await createApp({
  tools: [myCustomTool, bashTool, agentTool, readTool, patchTool],
});
Always implement derivePermission to enable “always allow” functionality for your tool.

Tool Permissions

Clients can control tool execution through permission configurations:
// Client request example
await fetch('/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'glm-4.7',
    messages: [...],
    permissions: {
      allowlist: [
        { tool: 'bash', args: { command: 'ls *' } },
        { tool: 'read' },
      ],
      deny: [
        { tool: 'bash', args: { command: 'rm *' } },
      ],
    },
  }),
});
Permission patterns use glob matching (via picomatch):
  • allowlist: Auto-approve matching tools
  • allowOnce: One-time approval, consumed on use
  • deny: Immediately reject matching tools

Model Configuration

Default Model

Specify a default model for requests that don’t include a model:
const app = await createApp({
  defaultModel: process.env.DEFAULT_MODEL || "glm-4.7",
});
The /models endpoint returns the default model if it’s supported by the configured provider:
{
  "models": ["glm-4.7", "kimi-k2.5", "..."]
  "defaultModel": "glm-4.7"
}

Model Validation

The server validates models against the provider’s supported models:
const models = await harness.supportedModels();
const validDefault = defaultModel && models.includes(defaultModel) 
  ? defaultModel 
  : undefined;

Skills Configuration

Skills extend agent capabilities with specialized instructions and workflows:
const app = await createApp({
  skillDirs: [
    "./skills",
    "/etc/llm-gateway/skills",
    process.env.CUSTOM_SKILLS_DIR,
  ].filter(Boolean),
});

Skills Discovery

The server automatically discovers skills in configured directories:
  1. Searches each skillDir for valid skill definitions
  2. Formats skills into a system prompt
  3. Prepends skills prompt to agent messages
Skills are discovered at server startup. Restart the server to load new skills.

Request Configuration

Clients can configure individual chat requests:

Standard Mode

interface ChatRequest {
  model: string;              // Required: model identifier
  messages: Message[];        // Required: conversation history
  context?: string;           // Optional: additional context
  permissions?: Permissions;  // Optional: tool permissions
  mode?: "agent";            // Optional: defaults to agent
}

RLM Mode

interface ChatRequest {
  model: string;
  messages: Message[];
  context?: string;           // Long-form content to process
  mode: "rlm";               // Required for RLM
  maxIterations?: number;     // Default: 10
  maxDepth?: number;          // Default: 2
}
{
  "model": "glm-4.7",
  "messages": [
    { "role": "user", "content": "List files in current directory" }
  ],
  "permissions": {
    "allowlist": [{ "tool": "bash" }]
  }
}

Server Options

Bun server options can be configured in the export:
export default {
  port: Number(process.env.PORT) || 4000,
  fetch: app.fetch,
  idleTimeout: 255,  // Seconds before idle connections close
  // Additional Bun server options:
  // maxRequestBodySize: 1024 * 1024 * 10, // 10MB
  // development: process.env.NODE_ENV !== 'production',
};
See Bun server documentation for all available options.

Multi-Provider Setup

Support multiple providers by creating separate harness configurations:
import { createGeneratorHarness as createZen } from "./packages/ai/harness/providers/zen";
import { createAnthropicHarness } from "./packages/ai/harness/providers/anthropic";

// Route requests based on model prefix or header
const getHarness = (model: string) => {
  if (model.startsWith('claude-')) {
    return createAgentHarness({ harness: createAnthropicHarness() });
  }
  return createAgentHarness({ harness: createZen() });
};

// This requires modifying the server to support dynamic harness selection
The default server uses a single harness. Multi-provider support requires custom server modifications.

Performance Tuning

Connection Limits

Manage concurrent connections based on your infrastructure:
// Implement connection limiting middleware
const connectionLimit = 100;
let activeConnections = 0;

app.use(async (c, next) => {
  if (activeConnections >= connectionLimit) {
    return c.json({ error: 'Server at capacity' }, 503);
  }
  activeConnections++;
  try {
    await next();
  } finally {
    activeConnections--;
  }
});

Memory Management

Orchestrators are automatically cleaned up, but monitor memory for long-running sessions:
# Monitor memory usage
bun --inspect server/index.ts

# Set Node.js memory limits if needed
NODE_OPTIONS="--max-old-space-size=4096" bun run server/index.ts

Configuration Best Practices

1

Environment-based Configuration

Use environment variables for deployment-specific settings
2

Sensible Defaults

Provide reasonable defaults for optional configuration
3

Validation

Validate configuration at startup to fail fast
4

Documentation

Document custom tools and skills for your team

Next Steps

Environment Variables

Complete reference for all environment variables

HTTP API Reference

Detailed endpoint documentation

Build docs developers (and LLMs) love