Architecture Diagram
Key Properties
Security
API keys never leave the server. No secrets in client binaries.
Centralized Credentials
Single source of truth for API keys. Easy rotation and auditing.
Unified Observability
All LLM requests flow through proxy layer for logging and monitoring.
Multi-Provider Support
Server can host multiple providers simultaneously with separate credentials.
Request Flow
1. Client Creates Provider-Specific Client
Clients useProxyLlmClient from loom-server-llm-proxy crate:
2. Client Sends Request to Provider-Specific Endpoint
TheProxyLlmClient implements the LlmClient trait and forwards requests to provider-specific endpoints:
3. Server Routes to Provider Client
The server’sLlmService manages all provider clients and routes requests:
4. Provider Client Makes API Call
Provider-specific clients handle the actual API communication:5. Response Streams Back Through Proxy
SSE events flow from provider → server → client:Proxy Endpoints
Per-Provider Endpoints
Each provider has dedicated complete and stream endpoints:Non-streaming completion for Anthropic ClaudeRequest Body:
Response:
LlmRequest JSONResponse:
LlmResponse JSONSSE streaming completion for Anthropic ClaudeRequest Body:
Response: SSE stream of
LlmRequest JSONResponse: SSE stream of
LlmEvent JSONNon-streaming completion for OpenAI GPTRequest Body:
Response:
LlmRequest JSONResponse:
LlmResponse JSONSSE streaming completion for OpenAI GPTRequest Body:
Response: SSE stream of
LlmRequest JSONResponse: SSE stream of
LlmEvent JSONNon-streaming completion for Google Vertex AIRequest Body:
Response:
LlmRequest JSONResponse:
LlmResponse JSONSSE streaming completion for Google Vertex AIRequest Body:
Response: SSE stream of
LlmRequest JSONResponse: SSE stream of
LlmEvent JSONNon-streaming completion for Z.ai (智谱AI)Request Body:
Response:
LlmRequest JSONResponse:
LlmResponse JSONSSE streaming completion for Z.aiRequest Body:
Response: SSE stream of
LlmRequest JSONResponse: SSE stream of
LlmEvent JSONWire Format
Request Format (All Providers)
Complete Response Format
Streaming Event Format
SSE stream withLlmEvent JSON payloads:
The SSE format uses
\n\n as the event delimiter. Each data: line contains a JSON-encoded LlmEvent.LlmService Architecture
TheLlmService crate (loom-server-llm-service) provides server-side provider abstraction:
Configuration
Provider Availability Checks
Provider-Specific Methods
The server can have all providers configured simultaneously. Clients choose which provider to use by selecting the appropriate endpoint path.
Client Authentication
The proxy supports optional bearer token authentication:Provider Implementations
Anthropic Client (loom-server-llm-anthropic)
- API:
POST /v1/messages - Headers:
x-api-key,anthropic-version: 2023-06-01 - System messages: Extracted to top-level
systemfield - Tool results: Sent as
tool_resultcontent blocks - Streaming: SSE with
message_start→content_block_delta→message_stop
OpenAI Client (loom-server-llm-openai)
- API:
POST /chat/completions - Headers:
Authorization: Bearer {api_key} - Tool choice: Defaults to
"auto"when tools provided - Streaming: SSE with
data: [DONE]marker
Vertex AI Client (loom-server-llm-vertex)
- API: Google Cloud Vertex AI API
- Auth: Service account credentials
- Models: Gemini Pro, Gemini Flash, etc.
Z.ai Client (loom-server-llm-zai)
- API:
POST /api/paas/v4/chat/completions(OpenAI-compatible) - Headers:
Authorization: Bearer {api_key} - Models:
glm-4.7,glm-4.6,glm-4.5,glm-4.5-flash, etc. - Streaming: SSE with
data: [DONE]marker (OpenAI-compatible)
Benefits
Security
- No API keys in client binaries or repositories
- Centralized credential rotation
- Audit logging at proxy layer
- Token-based client authentication
Observability
- All LLM requests logged server-side
- Unified metrics across providers
- Cost tracking per user/organization
- Performance monitoring
Flexibility
- Add providers without client updates
- A/B test different models
- Dynamic provider selection
- Fallback to alternate providers
Cost Control
- Rate limiting per user/organization
- Budget enforcement
- Provider pooling (e.g., Claude subscription sharing)
- Usage analytics
Adding a New Provider
Add proxy endpoints
Add
/proxy/{provider}/complete and /proxy/{provider}/stream routes in loom-server.Related Documentation
Architecture Overview
High-level system architecture
State Machine
Agent state machine design
LLM Client Spec
Detailed LLM client specification
Anthropic OAuth Pool
Claude subscription pooling