OpenAIProxy implements the System protocol and requires no boilerplate HTTP code. Point it at any proxy that serves the /v1/chat/completions endpoint and it handles message construction, authentication, retries, and token usage tracking automatically.
Constructor
Parameters
Root URL of the proxy, e.g.
"http://localhost:8080". A trailing slash is stripped automatically. OpenAIProxy appends /v1/chat/completions to this URL for every request.Model name sent in the
"model" field of each request body. The proxy is responsible for routing or mapping this value.Display name used in benchmark results and the
EvalRow.system field. If None, the name is derived from the host portion of base_url — for example, "http://localhost:8080" becomes "openai_proxy_localhost:8080".Bearer token sent in the
Authorization header. Falls back to the OPENAI_API_KEY environment variable when None. If neither is set, the header is omitted entirely.When provided, prepended as a
{"role": "system", "content": ...} message before any other messages in every request. Useful for injecting a fixed instruction without modifying your dataset.Full override for message construction. Receives the example dict and must return a list of message dicts in OpenAI chat format. When set,
system_prompt is ignored for single-turn examples (it is still applied in process_conversation).HTTP request timeout in seconds per attempt. Does not include retry delays.
Additional keys merged into the request body, e.g.
{"temperature": 0, "max_tokens": 256}. Useful for controlling generation parameters without subclassing.Number of retries on transient failures: HTTP 429 (rate limit), 5xx server errors, and connection errors. Set to
0 to disable retries. Retries use exponential backoff starting at retry_base_delay seconds. The Retry-After response header is respected for 429 responses.Base delay in seconds for exponential backoff between retries. The delay for attempt
n is retry_base_delay * 2^n.System protocol
OpenAIProxy satisfies the System protocol:
| Member | Description |
|---|---|
.name | The display name (set by name or derived from base_url) |
.process(example) | Sends the example to the proxy and returns the dict with a "response" key added |
.process_conversation(turns) | Sends a multi-turn conversation turn-by-turn and returns all assistant responses |
process() behavior
process(example) builds a message list from the example dict using the following rules (unless build_messages is provided):
- If
system_promptis set, it is prepended as a system message. - If the example has a
"turns"key, those messages are used directly (multi-turn passthrough). - If the example has a
"question"key,contextis sent as a system message andquestionas a user message. - Otherwise,
contextis sent as a single user message.
"response" key containing the assistant’s reply. If the proxy returns usage data, an "api_usage" key is also added.
Token counts in
EvalRow are measured by context-bench’s own tokenizer (tiktoken by default), not by the proxy’s reported usage. Proxy usage (if available) is stored in EvalRow.metadata under prompt_tokens, completion_tokens, and total_tokens.Environment variable
Examples
Basic usage
With a system prompt
With extra body parameters
With an explicit API key
Comparing two proxies
Custom message construction
Usebuild_messages for full control over the request format:
