Overview
The LLM interface provides two layers of abstraction:get_response_from_llm— a thin, retry-wrapped wrapper around LiteLLM for single-turn completions.chat_with_agent— a higher-level loop that injects tool descriptions into the prompt and dispatches tool calls until the model stops requesting them.
agent/llm.py and agent/llm_withtools.py.
Model Constants
All supported models are exposed as module-level constants inagent/llm.py.
| Constant | Model string | Provider |
|---|---|---|
CLAUDE_MODEL | anthropic/claude-sonnet-4-5-20250929 | Anthropic |
CLAUDE_HAIKU_MODEL | anthropic/claude-3-haiku-20240307 | Anthropic |
CLAUDE_35NEW_MODEL | anthropic/claude-3-5-sonnet-20241022 | Anthropic |
OPENAI_MODEL | openai/gpt-4o | OpenAI |
OPENAI_MINI_MODEL | openai/gpt-4o-mini | OpenAI |
OPENAI_O3_MODEL | openai/o3 | OpenAI |
OPENAI_O3MINI_MODEL | openai/o3-mini | OpenAI |
OPENAI_O4MINI_MODEL | openai/o4-mini | OpenAI |
OPENAI_GPT52_MODEL | openai/gpt-5.2 | OpenAI |
OPENAI_GPT5_MODEL | openai/gpt-5 | OpenAI |
OPENAI_GPT5MINI_MODEL | openai/gpt-5-mini | OpenAI |
GEMINI_3_MODEL | gemini/gemini-3-pro-preview | |
GEMINI_MODEL | gemini/gemini-2.5-pro | |
GEMINI_FLASH_MODEL | gemini/gemini-2.5-flash |
get_response_from_llm
Signature
Parameters
The new user message to send to the model.
LiteLLM model string. Must be prefixed with the provider namespace, e.g.
openai/gpt-4o or anthropic/claude-sonnet-4-5-20250929. Use one of the module-level constants listed above.Sampling temperature. Ignored for
openai/gpt-5 and openai/gpt-5-mini — those models only support the API default of 1.0.Maximum tokens to generate. Model-specific capping applies (see below).
Existing conversation history. Each entry is a dict with
"role" and "text" keys. The function converts "text" to "content" before calling LiteLLM and converts back on return, keeping the internal format consistent.Return Value
Returns a 3-tuple(response_text, new_msg_history, info):
The model’s reply text.
Updated conversation history including the new user message and the model reply. Entries use the
"text" key (not "content").Currently always an empty dict
{}. Reserved for future metadata.Model-Specific Behaviors
| Condition | Behavior |
|---|---|
model is openai/gpt-5 or openai/gpt-5-mini | temperature is not passed to the API |
"gpt-5" appears anywhere in model | Uses max_completion_tokens instead of max_tokens |
"claude-3-haiku" appears in model | max_tokens is capped at min(max_tokens, 4096) |
Message History Format
Conversation history entries use the"text" key internally:
"text" → "content" and restores "content" → "text" on the way out. This keeps history compatible with HyperAgents’ internal format while remaining transparent to callers.
Retry Behaviour
get_response_from_llm is decorated with an exponential-backoff retry:
chat_with_agent
Runs a full tool-use loop: injects available tool descriptions into the system prompt, calls the LLM, dispatches any tool calls found in the response, feeds results back, and repeats until the model produces a response with no tool calls or the call limit is hit.
Signature
Parameters
The user message to start the conversation.
LiteLLM model string. Same format as
get_response_from_llm.Prior conversation history in the internal
"text"-keyed format. Passed through to get_response_from_llm unchanged.Logging function. Called with a single string argument for each significant event (input message, model output, tool output, errors).
Controls which tools are loaded:
[](empty list) — no tools are loaded; the model receives no tool instructions.'all'— every tool module found inagent/tools/is loaded.- A list of tool name strings, e.g.
['bash', 'edit']— only the named tools are loaded.
When
True, all tool calls present in a single model response are dispatched in sequence. When False (default), only the first tool call is processed per response.Maximum total tool calls allowed across the entire conversation. Set to
-1 for unlimited. When the limit is reached the loop exits and the accumulated history is returned.Return Value
The complete conversation history after all tool-use rounds, in the
"text"-keyed internal format.Tool-Use Loop
-
Tool descriptions are formatted via
get_tooluse_prompt()and prepended tomsg. -
get_response_from_llmis called with the combined prompt. -
The response is scanned for
<json>...</json>blocks using the regexr'<json>\s*(\{.*?\})\s*</json>'. -
Each block must contain a JSON object with
tool_nameandtool_inputkeys: -
process_tool_calldispatches to the matching tool function and the output is wrapped: - The wrapped result is sent back to the model as the next user message.
- The loop continues until no tool calls are found in a response.
Truncated Tool Call Retry
should_retry_tool_use(response, tool_uses) detects responses where the model started forming a tool call but was cut off by the output token limit. It returns True when all of the following hold:
- No valid tool calls were parsed from the response.
- The response contains
<json>,tool_name, andtool_inputmarkers, in that order. len(response) >= 2000.
"Error: Output context exceeded. Please try again." is sent back to the model.
process_tool_call
Signature
Parameters
Mapping of tool name → tool dict as built by
load_tools. Each value has info and function keys.Name of the tool to call, e.g.
"bash" or "editor".Keyword arguments forwarded to the tool’s function via
**tool_input.Return Value
The string output from the tool function, or an error message prefixed with
"Error: " if the tool is not found or raises an exception.