Overview
LLM Gateway provides streaming LLM calls through provider harnesses. Each harness implements a simple async generator interface that yields events as tokens arrive from the API.Quick Start
Choose a Provider
Select from available providers:
zen (OpenAI-compatible), anthropic, openai, or openrouter.Event Types
Provider harnesses yield these events:| Event | Description | Fields |
|---|---|---|
harness_start | Stream begins | runId |
text | Streamed text token | id, runId, content |
reasoning | Streamed reasoning token | id, runId, content |
tool_call | Model requested tool | id, runId, name, input |
usage | Token usage stats | runId, inputTokens, outputTokens |
error | Error occurred | runId, message |
harness_end | Stream complete | runId |
Provider-Specific Configuration
- Zen (OpenAI-compatible)
- Anthropic
- OpenAI
- OpenRouter
The Zen provider works with OpenAI-compatible APIs and supports reasoning content:
Reasoning vs Text
Models that support extended thinking (like DeepSeek, o1) emit separate streams:reasoningevents: Internal model thinking process (not part of final answer)textevents: Final output tokens
Message History
Build multi-turn conversations by accumulating messages:System Prompts
Include system messages to set behavior:Error Handling
Handle errors gracefully:Tracking Token Usage
Accumulate token counts across the stream:Run IDs and Provenance
Every event carries arunId that identifies the LLM invocation:
runId, and child runs include parentId to preserve the call graph.
Composition
Provider harnesses compose with other harnesses. See the Tool Calling and Multi-Agent guides for wrapping providers with agentic capabilities.Next Steps
Tool Calling
Add tools to let the model execute actions
Multi-Agent
Orchestrate multiple concurrent agents
