Skip to main content
Loom uses a server-side proxy architecture for all LLM interactions. API keys are stored exclusively on the server, and clients communicate through HTTP proxy endpoints. This design provides enhanced security, centralized credential management, and unified observability.

Architecture Diagram

Key Properties

Security

API keys never leave the server. No secrets in client binaries.

Centralized Credentials

Single source of truth for API keys. Easy rotation and auditing.

Unified Observability

All LLM requests flow through proxy layer for logging and monitoring.

Multi-Provider Support

Server can host multiple providers simultaneously with separate credentials.

Request Flow

1. Client Creates Provider-Specific Client

Clients use ProxyLlmClient from loom-server-llm-proxy crate:
use loom_server_llm_proxy::{ProxyLlmClient, LlmProvider};

// Convenience constructors for specific providers
let anthropic_client = ProxyLlmClient::anthropic("https://loom.example.com")?;
let openai_client = ProxyLlmClient::openai("https://loom.example.com")?;
let vertex_client = ProxyLlmClient::vertex("https://loom.example.com")?;
let zai_client = ProxyLlmClient::zai("https://loom.example.com")?;

// Or explicit provider selection
let client = ProxyLlmClient::new(
    "https://loom.example.com",
    LlmProvider::Anthropic
)?;

2. Client Sends Request to Provider-Specific Endpoint

The ProxyLlmClient implements the LlmClient trait and forwards requests to provider-specific endpoints:
let request = LlmRequest::new("claude-sonnet-4-20250514")
    .with_messages(messages)
    .with_tools(tools)
    .with_max_tokens(4096);

let response = anthropic_client.complete(request).await?;
// Sends POST to /proxy/anthropic/complete

3. Server Routes to Provider Client

The server’s LlmService manages all provider clients and routes requests:
// Server startup (loom-server-llm-service)
let service = LlmService::from_env()?;
// Reads ANTHROPIC_API_KEY, OPENAI_API_KEY, VERTEX_API_KEY, ZAI_API_KEY

// Proxy endpoint handler (loom-server)
async fn handle_anthropic_stream(
    State(service): State<Arc<LlmService>>,
    Json(request): Json<LlmRequest>,
) -> Result<Sse<impl Stream<Item = Event>>, StatusCode> {
    let stream = service.complete_streaming_anthropic(request).await?;
    Ok(Sse::new(stream))
}

4. Provider Client Makes API Call

Provider-specific clients handle the actual API communication:
// loom-server-llm-anthropic
impl AnthropicClient {
    async fn complete_streaming(&self, request: LlmRequest) 
        -> Result<LlmStream, LlmError> 
    {
        let anthropic_request = convert_to_anthropic_format(request);
        
        let response = self.http_client
            .post(&self.config.base_url)
            .header("x-api-key", self.config.api_key.expose())
            .header("anthropic-version", "2023-06-01")
            .json(&anthropic_request)
            .send()
            .await?;
        
        let stream = parse_anthropic_sse_stream(response.bytes_stream());
        Ok(LlmStream::new(Box::pin(stream)))
    }
}

5. Response Streams Back Through Proxy

SSE events flow from provider → server → client:
Anthropic API → AnthropicClient → LlmService → Proxy Endpoint → ProxyLlmClient → CLI

Proxy Endpoints

Per-Provider Endpoints

Each provider has dedicated complete and stream endpoints:
/proxy/anthropic/complete
POST
Non-streaming completion for Anthropic ClaudeRequest Body: LlmRequest JSON
Response: LlmResponse JSON
/proxy/anthropic/stream
POST
SSE streaming completion for Anthropic ClaudeRequest Body: LlmRequest JSON
Response: SSE stream of LlmEvent JSON
/proxy/openai/complete
POST
Non-streaming completion for OpenAI GPTRequest Body: LlmRequest JSON
Response: LlmResponse JSON
/proxy/openai/stream
POST
SSE streaming completion for OpenAI GPTRequest Body: LlmRequest JSON
Response: SSE stream of LlmEvent JSON
/proxy/vertex/complete
POST
Non-streaming completion for Google Vertex AIRequest Body: LlmRequest JSON
Response: LlmResponse JSON
/proxy/vertex/stream
POST
SSE streaming completion for Google Vertex AIRequest Body: LlmRequest JSON
Response: SSE stream of LlmEvent JSON
/proxy/zai/complete
POST
Non-streaming completion for Z.ai (智谱AI)Request Body: LlmRequest JSON
Response: LlmResponse JSON
/proxy/zai/stream
POST
SSE streaming completion for Z.aiRequest Body: LlmRequest JSON
Response: SSE stream of LlmEvent JSON

Wire Format

Request Format (All Providers)

{
  "model": "claude-sonnet-4-20250514",
  "messages": [
    {
      "role": "user",
      "content": "Hello, world!"
    }
  ],
  "tools": [
    {
      "name": "read_file",
      "description": "Read a file from the filesystem",
      "input_schema": {
        "type": "object",
        "properties": {
          "path": { "type": "string" }
        },
        "required": ["path"]
      }
    }
  ],
  "max_tokens": 4096,
  "temperature": 0.7
}

Complete Response Format

{
  "message": {
    "role": "assistant",
    "content": "Hello! How can I help you today?"
  },
  "tool_calls": [],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 9,
    "total_tokens": 24
  },
  "finish_reason": "end_turn"
}

Streaming Event Format

SSE stream with LlmEvent JSON payloads:
data: {"type":"text_delta","content":"Hello"}

data: {"type":"text_delta","content":"!"}

data: {"type":"completed","response":{"message":{...},"usage":{...}}}
The SSE format uses \n\n as the event delimiter. Each data: line contains a JSON-encoded LlmEvent.

LlmService Architecture

The LlmService crate (loom-server-llm-service) provides server-side provider abstraction:

Configuration

pub struct LlmServiceConfig {
    pub anthropic_api_key: Option<SecretString>,
    pub openai_api_key: Option<SecretString>,
    pub vertex_api_key: Option<SecretString>,
    pub zai_api_key: Option<SecretString>,
}

impl LlmService {
    // Reads from environment variables
    pub fn from_env() -> Result<Self, LlmServiceError>;
}

Provider Availability Checks

impl LlmService {
    pub fn has_anthropic(&self) -> bool;
    pub fn has_openai(&self) -> bool;
    pub fn has_vertex(&self) -> bool;
    pub fn has_zai(&self) -> bool;
}

Provider-Specific Methods

impl LlmService {
    // Anthropic
    pub async fn complete_anthropic(
        &self,
        request: LlmRequest
    ) -> Result<LlmResponse, LlmError>;
    
    pub async fn complete_streaming_anthropic(
        &self,
        request: LlmRequest
    ) -> Result<LlmStream, LlmError>;
    
    // OpenAI
    pub async fn complete_openai(
        &self,
        request: LlmRequest
    ) -> Result<LlmResponse, LlmError>;
    
    pub async fn complete_streaming_openai(
        &self,
        request: LlmRequest
    ) -> Result<LlmStream, LlmError>;
    
    // Vertex AI
    pub async fn complete_vertex(
        &self,
        request: LlmRequest
    ) -> Result<LlmResponse, LlmError>;
    
    pub async fn complete_streaming_vertex(
        &self,
        request: LlmRequest
    ) -> Result<LlmStream, LlmError>;
    
    // Z.ai
    pub async fn complete_zai(
        &self,
        request: LlmRequest
    ) -> Result<LlmResponse, LlmError>;
    
    pub async fn complete_streaming_zai(
        &self,
        request: LlmRequest
    ) -> Result<LlmStream, LlmError>;
}
The server can have all providers configured simultaneously. Clients choose which provider to use by selecting the appropriate endpoint path.

Client Authentication

The proxy supports optional bearer token authentication:
let client = ProxyLlmClient::anthropic("https://loom.example.com")
    .with_auth_token(SecretString::new("user-token"));
The server validates tokens and enforces authorization policies based on user identity.

Provider Implementations

Anthropic Client (loom-server-llm-anthropic)

  • API: POST /v1/messages
  • Headers: x-api-key, anthropic-version: 2023-06-01
  • System messages: Extracted to top-level system field
  • Tool results: Sent as tool_result content blocks
  • Streaming: SSE with message_startcontent_block_deltamessage_stop

OpenAI Client (loom-server-llm-openai)

  • API: POST /chat/completions
  • Headers: Authorization: Bearer {api_key}
  • Tool choice: Defaults to "auto" when tools provided
  • Streaming: SSE with data: [DONE] marker

Vertex AI Client (loom-server-llm-vertex)

  • API: Google Cloud Vertex AI API
  • Auth: Service account credentials
  • Models: Gemini Pro, Gemini Flash, etc.

Z.ai Client (loom-server-llm-zai)

  • API: POST /api/paas/v4/chat/completions (OpenAI-compatible)
  • Headers: Authorization: Bearer {api_key}
  • Models: glm-4.7, glm-4.6, glm-4.5, glm-4.5-flash, etc.
  • Streaming: SSE with data: [DONE] marker (OpenAI-compatible)

Benefits

Security

  • No API keys in client binaries or repositories
  • Centralized credential rotation
  • Audit logging at proxy layer
  • Token-based client authentication

Observability

  • All LLM requests logged server-side
  • Unified metrics across providers
  • Cost tracking per user/organization
  • Performance monitoring

Flexibility

  • Add providers without client updates
  • A/B test different models
  • Dynamic provider selection
  • Fallback to alternate providers

Cost Control

  • Rate limiting per user/organization
  • Budget enforcement
  • Provider pooling (e.g., Claude subscription sharing)
  • Usage analytics

Adding a New Provider

1

Create provider client crate

Create loom-server-llm-{provider} with LlmClient trait implementation.
2

Add to LlmService

Update loom-server-llm-service to include the new provider client.
3

Add proxy endpoints

Add /proxy/{provider}/complete and /proxy/{provider}/stream routes in loom-server.
4

Update ProxyLlmClient

Add convenience constructor in loom-server-llm-proxy (e.g., ProxyLlmClient::new_provider()).
No client-side changes required! Once the server is updated, all clients automatically gain access to the new provider through the proxy.

Architecture Overview

High-level system architecture

State Machine

Agent state machine design

LLM Client Spec

Detailed LLM client specification

Anthropic OAuth Pool

Claude subscription pooling

Build docs developers (and LLMs) love