HTTP API Overview

The LLM Gateway HTTP API provides a streaming interface for interacting with AI agents through Server-Sent Events (SSE). It exposes the agent orchestrator to clients and manages per-session orchestrators with automatic cleanup.

Base URL

The default server runs on:

http://localhost:4000

Endpoints

GET /models

List available models and default model

POST /chat

Stream chat completions with tool execution

POST /chat/relay/:relayId

Resolve permission requests for tool execution

Architecture

Each POST /chat request creates a fresh AgentOrchestrator for that session. Server-Sent Events flow from the orchestrator until the stream closes or an error occurs. The orchestrator is automatically cleaned up when the connection ends.

Event Streaming

All streaming endpoints use Server-Sent Events (SSE) with the following format:

event: <event_type>
data: <json_payload>

Common Event Types

connected - Initial connection established with session ID
harness_start - Agent harness begins processing
text - Text content from the AI
reasoning - Internal reasoning steps (if supported by model)
tool_call - Agent is calling a tool
tool_result - Result from tool execution
relay - Permission request for tool execution
usage - Token usage information
harness_end - Agent harness completed processing
error - Error occurred during processing

Authentication

The HTTP API currently does not require authentication. Configure your deployment environment to add authentication layers as needed.

Error Handling

All endpoints return standard HTTP status codes:

200 - Success
400 - Bad Request (invalid JSON or missing required fields)
404 - Not Found (session or relay not found)
500 - Internal Server Error

Error responses include a JSON body:

{
  "error": "Error message describing what went wrong"
}

SSE streams may also emit error events:

event: error
data: {"type":"error","message":"Error details"}

Starting the Server

bun run dev:server  # starts with hot reload

Configure the server with environment variables:

PORT - Server port (default: 4000)
DEFAULT_MODEL - Default model when not specified in requests

Harnesses

Orchestration

Tools

Client Library

Types & Primitives

HTTP API

Base URL

Endpoints

GET /models

POST /chat

POST /chat/relay/:relayId

Architecture

Event Streaming

Common Event Types

Authentication

Error Handling

Starting the Server

Build docs developers (and LLMs) love

Harnesses

Orchestration

Tools

Client Library

Types & Primitives

HTTP API

​Base URL

​Endpoints

GET /models

POST /chat

POST /chat/relay/:relayId

​Architecture

​Event Streaming

​Common Event Types

​Authentication

​Error Handling

​Starting the Server

Build docs developers (and LLMs) love

Base URL

Endpoints

Architecture

Event Streaming

Common Event Types

Authentication

Error Handling

Starting the Server