Skip to main content

Overview

The LLM interface provides two layers of abstraction:
  • get_response_from_llm — a thin, retry-wrapped wrapper around LiteLLM for single-turn completions.
  • chat_with_agent — a higher-level loop that injects tool descriptions into the prompt and dispatches tool calls until the model stops requesting them.
Both are in agent/llm.py and agent/llm_withtools.py.

Model Constants

All supported models are exposed as module-level constants in agent/llm.py.
ConstantModel stringProvider
CLAUDE_MODELanthropic/claude-sonnet-4-5-20250929Anthropic
CLAUDE_HAIKU_MODELanthropic/claude-3-haiku-20240307Anthropic
CLAUDE_35NEW_MODELanthropic/claude-3-5-sonnet-20241022Anthropic
OPENAI_MODELopenai/gpt-4oOpenAI
OPENAI_MINI_MODELopenai/gpt-4o-miniOpenAI
OPENAI_O3_MODELopenai/o3OpenAI
OPENAI_O3MINI_MODELopenai/o3-miniOpenAI
OPENAI_O4MINI_MODELopenai/o4-miniOpenAI
OPENAI_GPT52_MODELopenai/gpt-5.2OpenAI
OPENAI_GPT5_MODELopenai/gpt-5OpenAI
OPENAI_GPT5MINI_MODELopenai/gpt-5-miniOpenAI
GEMINI_3_MODELgemini/gemini-3-pro-previewGoogle
GEMINI_MODELgemini/gemini-2.5-proGoogle
GEMINI_FLASH_MODELgemini/gemini-2.5-flashGoogle
The default token budget is:
MAX_TOKENS = 16384

get_response_from_llm

from agent.llm import get_response_from_llm

response_text, new_msg_history, info = get_response_from_llm(
    msg="What is the capital of France?",
    model="openai/gpt-4o",
)

Signature

def get_response_from_llm(
    msg: str,
    model: str = OPENAI_MODEL,
    temperature: float = 0.0,
    max_tokens: int = MAX_TOKENS,
    msg_history=None,
) -> Tuple[str, list, dict]

Parameters

msg
str
required
The new user message to send to the model.
model
str
default:"openai/gpt-4o"
LiteLLM model string. Must be prefixed with the provider namespace, e.g. openai/gpt-4o or anthropic/claude-sonnet-4-5-20250929. Use one of the module-level constants listed above.
temperature
float
default:"0.0"
Sampling temperature. Ignored for openai/gpt-5 and openai/gpt-5-mini — those models only support the API default of 1.0.
max_tokens
int
default:"16384"
Maximum tokens to generate. Model-specific capping applies (see below).
msg_history
list | None
default:"None"
Existing conversation history. Each entry is a dict with "role" and "text" keys. The function converts "text" to "content" before calling LiteLLM and converts back on return, keeping the internal format consistent.

Return Value

Returns a 3-tuple (response_text, new_msg_history, info):
response_text
str
The model’s reply text.
new_msg_history
list
Updated conversation history including the new user message and the model reply. Entries use the "text" key (not "content").
info
dict
Currently always an empty dict {}. Reserved for future metadata.

Model-Specific Behaviors

ConditionBehavior
model is openai/gpt-5 or openai/gpt-5-minitemperature is not passed to the API
"gpt-5" appears anywhere in modelUses max_completion_tokens instead of max_tokens
"claude-3-haiku" appears in modelmax_tokens is capped at min(max_tokens, 4096)

Message History Format

Conversation history entries use the "text" key internally:
msg_history = [
    {"role": "user",      "text": "Hello!"},
    {"role": "assistant", "text": "Hi, how can I help?"},
]
Before the LiteLLM API call the function rewrites "text""content" and restores "content""text" on the way out. This keeps history compatible with HyperAgents’ internal format while remaining transparent to callers.

Retry Behaviour

get_response_from_llm is decorated with an exponential-backoff retry:
@backoff.on_exception(
    backoff.expo,
    (requests.exceptions.RequestException, json.JSONDecodeError, KeyError),
    max_time=600,   # give up after 10 minutes
    max_value=60,   # cap individual delay at 60 s
)
Network errors, malformed JSON responses, and missing keys in the API response are all retried automatically.

chat_with_agent

Runs a full tool-use loop: injects available tool descriptions into the system prompt, calls the LLM, dispatches any tool calls found in the response, feeds results back, and repeats until the model produces a response with no tool calls or the call limit is hit.
from agent.llm_withtools import chat_with_agent

new_history = chat_with_agent(
    msg="List files in /tmp",
    model="anthropic/claude-sonnet-4-5-20250929",
    tools_available="all",
)

Signature

def chat_with_agent(
    msg,
    model="claude-4-sonnet-genai",
    msg_history=None,
    logging=print,
    tools_available=[],
    multiple_tool_calls=False,
    max_tool_calls=40,
) -> list

Parameters

msg
str
required
The user message to start the conversation.
model
str
default:"claude-4-sonnet-genai"
LiteLLM model string. Same format as get_response_from_llm.
msg_history
list | None
default:"None"
Prior conversation history in the internal "text"-keyed format. Passed through to get_response_from_llm unchanged.
logging
callable
default:"print"
Logging function. Called with a single string argument for each significant event (input message, model output, tool output, errors).
tools_available
list | 'all'
default:"[]"
Controls which tools are loaded:
  • [] (empty list) — no tools are loaded; the model receives no tool instructions.
  • 'all' — every tool module found in agent/tools/ is loaded.
  • A list of tool name strings, e.g. ['bash', 'edit'] — only the named tools are loaded.
multiple_tool_calls
bool
default:"false"
When True, all tool calls present in a single model response are dispatched in sequence. When False (default), only the first tool call is processed per response.
max_tool_calls
int
default:"40"
Maximum total tool calls allowed across the entire conversation. Set to -1 for unlimited. When the limit is reached the loop exits and the accumulated history is returned.

Return Value

new_msg_history
list
The complete conversation history after all tool-use rounds, in the "text"-keyed internal format.

Tool-Use Loop

  1. Tool descriptions are formatted via get_tooluse_prompt() and prepended to msg.
  2. get_response_from_llm is called with the combined prompt.
  3. The response is scanned for <json>...</json> blocks using the regex r'<json>\s*(\{.*?\})\s*</json>'.
  4. Each block must contain a JSON object with tool_name and tool_input keys:
    {
      "tool_name": "bash",
      "tool_input": {"command": "ls /tmp"}
    }
    
  5. process_tool_call dispatches to the matching tool function and the output is wrapped:
    {
      "tool_name": "bash",
      "tool_input": {"command": "ls /tmp"},
      "tool_output": "file1.txt\nfile2.txt"
    }
    
  6. The wrapped result is sent back to the model as the next user message.
  7. The loop continues until no tool calls are found in a response.

Truncated Tool Call Retry

should_retry_tool_use(response, tool_uses) detects responses where the model started forming a tool call but was cut off by the output token limit. It returns True when all of the following hold:
  • No valid tool calls were parsed from the response.
  • The response contains <json>, tool_name, and tool_input markers, in that order.
  • len(response) >= 2000.
When a retry is needed, the message "Error: Output context exceeded. Please try again." is sent back to the model.

process_tool_call

from agent.llm_withtools import process_tool_call

result = process_tool_call(tools_dict, "bash", {"command": "echo hello"})

Signature

def process_tool_call(tools_dict: dict, tool_name: str, tool_input: dict) -> str

Parameters

tools_dict
dict
required
Mapping of tool name → tool dict as built by load_tools. Each value has info and function keys.
tool_name
str
required
Name of the tool to call, e.g. "bash" or "editor".
tool_input
dict
required
Keyword arguments forwarded to the tool’s function via **tool_input.

Return Value

result
str
The string output from the tool function, or an error message prefixed with "Error: " if the tool is not found or raises an exception.

Build docs developers (and LLMs) love