Overview
The LLM proxy endpoints allow clients to make LLM completion requests through the Loom server without exposing API keys. The server handles:
- API key management and rotation
- Request/response logging and auditing
- Rate limiting and retry logic
- Token usage tracking
- Server-to-client queries during streaming
Supported Providers:
- Anthropic (Claude models)
- OpenAI (GPT models)
- Vertex AI (Google Cloud)
- Z.ai (Z.ai models)
All proxy endpoints accept a standard LlmRequest payload:
Model identifier (e.g., "claude-sonnet-4", "gpt-4o")
Array of message objects with role and content
Tool definitions for function calling
Maximum tokens to generate
Sampling temperature (0.0-1.0)
Anthropic Complete
POST /proxy/anthropic/complete
Synchronous Anthropic completion. Returns the full response when complete.
Request Body
See Request Format above.
Response
Assistant message with role: "assistant" and content
Array of tool call objects (if any)
Token usage: {input_tokens: number, output_tokens: number}
Reason for completion: "stop", "length", "tool_use", etc.
Example
curl -X POST https://loom.ghuntley.com/proxy/anthropic/complete \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "user", "content": "Explain Rust ownership in one sentence."}
],
"max_tokens": 100
}'
Response (200 OK):
{
"message": {
"role": "assistant",
"content": "Rust ownership ensures memory safety by enforcing that each value has a single owner, automatically deallocating memory when the owner goes out of scope."
},
"tool_calls": [],
"usage": {
"input_tokens": 18,
"output_tokens": 31
},
"finish_reason": "stop"
}
Anthropic Stream
POST /proxy/anthropic/stream
Streaming Anthropic completion via Server-Sent Events (SSE).
Request Body
See Request Format above.
Response
Returns text/event-stream with events tagged as event: llm.
Event Types
Text Delta
{"type": "text_delta", "content": "Rust ownership"}
{
"type": "tool_call_delta",
"call_id": "call_abc123",
"tool_name": "read_file",
"arguments_fragment": "{\"path\":\"/src"
}
Server Query
Server requests information from the client:
{
"type": "server_query",
"id": "Q-abc123",
"kind": {"ReadFile": {"path": "/test.txt"}},
"sent_at": "2026-03-03T12:00:00Z",
"timeout_secs": 30,
"metadata": {}
}
Client must respond via POST /api/sessions/{session_id}/query-response.
Completed
{
"type": "completed",
"response": {
"message": {...},
"tool_calls": [...],
"usage": {...},
"finish_reason": "stop"
}
}
Error
{"type": "error", "message": "Rate limited; retry after 30 seconds"}
Example
curl -N -X POST https://loom.ghuntley.com/proxy/anthropic/stream \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "user", "content": "Write a haiku about coding."}
]
}'
Stream output:
event: llm
data: {"type":"text_delta","content":"Code"}
event: llm
data: {"type":"text_delta","content":" flows like"}
event: llm
data: {"type":"text_delta","content":" water\\n"}
event: llm
data: {"type":"completed","response":{"message":{...},"tool_calls":[],"usage":{...},"finish_reason":"stop"}}
OpenAI Complete
POST /proxy/openai/complete
Synchronous OpenAI completion.
Request/Response
Same format as Anthropic endpoints. See Request Format and Anthropic Complete.
Example
curl -X POST https://loom.ghuntley.com/proxy/openai/complete \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'
OpenAI Stream
POST /proxy/openai/stream
Streaming OpenAI completion via SSE. Same event format as Anthropic Stream.
Example
curl -N -X POST https://loom.ghuntley.com/proxy/openai/stream \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Count to 5."}
]
}'
Vertex Complete
POST /proxy/vertex/complete
Synchronous Vertex AI completion (Google Cloud).
Request/Response
Same format as Anthropic endpoints. See Request Format and Anthropic Complete.
Example
curl -X POST https://loom.ghuntley.com/proxy/vertex/complete \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.0-flash-exp",
"messages": [
{"role": "user", "content": "Summarize quantum computing."}
]
}'
Vertex Stream
POST /proxy/vertex/stream
Streaming Vertex AI completion via SSE. Same event format as Anthropic Stream.
Example
curl -N -X POST https://loom.ghuntley.com/proxy/vertex/stream \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.0-flash-exp",
"messages": [
{"role": "user", "content": "Explain async/await."}
]
}'
Z.ai Complete
Synchronous Z.ai completion.
Request/Response
Same format as Anthropic endpoints. See Request Format and Anthropic Complete.
Example
curl -X POST https://loom.ghuntley.com/proxy/zai/complete \
-H "Content-Type: application/json" \
-d '{
"model": "z.ai-model",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'
Z.ai Stream
Streaming Z.ai completion via SSE. Same event format as Anthropic Stream.
Example
curl -N -X POST https://loom.ghuntley.com/proxy/zai/stream \
-H "Content-Type: application/json" \
-d '{
"model": "z.ai-model",
"messages": [
{"role": "user", "content": "Tell me a joke."}
]
}'
Error Handling
All endpoints return errors in this format:
{
"error": "service_unavailable",
"message": "Anthropic provider is not configured on the server"
}
Common Errors
| Status | Error Code | Description |
|---|
| 503 | service_unavailable | Provider not configured or unavailable |
| 503 | rate_limited | Upstream rate limit hit |
| 504 | timeout | LLM request timed out |
| 500 | upstream_error | Provider returned an error |
Rate Limiting
When rate limited, the response includes retry information:
{
"error": "rate_limited",
"message": "LLM rate limited; retry after 30 seconds"
}
Clients should respect the retry delay and implement exponential backoff.
Audit Logging
All LLM requests are logged for audit purposes:
- LlmRequestStarted: Provider, model, message count
- LlmRequestCompleted: Provider, model, tool call count
- LlmRequestFailed: Provider, model, error message
Logs are queryable via the admin audit log endpoints.