LocalAgent is a self-contained agent designed for use with local Ollama models. Its sole responsibility is context compression: it takes verbose text and returns a condensed version that retains all critical facts while dramatically reducing token count. Unlike the other agents it does not extend BaseAgent and is not exported from the agents package — it must be imported directly.
Import
from agents.local_agent import LocalAgent
LocalAgent is intentionally excluded from agents/__init__.py. It is a specialised utility; import it from agents.local_agent to make the dependency explicit.
Constructor
LocalAgent(
llm: BaseChatModel,
system_prompt: str = "You are a specialized summarization agent. ...",
agent_name: str = "LocalAgent"
)
The LangChain chat model to use for summarization. Optimised for local models via ChatOllama, but any BaseChatModel implementation works.
System instruction controlling summarization behaviour. The default prompt instructs the model to be aggressive about compression while preserving every important detail. Override this for domain-specific compression styles.
Display name used in log messages.
Methods
invoke
invoke(user_input: str, **kwargs: Any) -> Any
Sends user_input through a ChatPromptTemplate | llm chain and returns the compressed text. The interface is intentionally identical to BaseAgent.invoke() so that LocalAgent can be used as a drop-in compressor wherever that signature is expected.
The text to be summarized or compressed.
Additional key-value pairs merged into the prompt template’s format call.
Returns: str — the response.content string containing the compressed text.
Usage example
Standalone summarization
from langchain_ollama import ChatOllama
from agents.local_agent import LocalAgent
llm = ChatOllama(model="llama3")
compressor = LocalAgent(llm=llm)
long_text = """
The history of computing spans several decades and includes many pivotal
inventions: the transistor, the integrated circuit, the personal computer,
the internet, and now artificial intelligence. Each era built on the last,
accelerating the pace of change until today's models can reason, plan, and
generate human-quality text at scale.
"""
summary = compressor.invoke(long_text)
print(summary)
# → "Computing evolved from transistors → ICs → PCs → internet → AI,
# each era accelerating innovation."
As a compressor inside an orchestrator
from langchain_ollama import ChatOllama
from agents import PlanningAgent, ExecutionAgent, MonitoringAgent
from agents.local_agent import LocalAgent
from orchestators import LangGraphOrchestrator
llm = ChatOllama(model="llama3")
compressor = LocalAgent(llm=llm)
# Pass LocalAgent as the context compressor instead of CompressContextTool
orchestrator = LangGraphOrchestrator(
planner=PlanningAgent(llm=llm),
executor=ExecutionAgent(llm=llm),
monitor=MonitoringAgent(llm=llm),
compressor=compressor,
max_retries=2,
)
orchestrator.run("Summarise the Q3 financial report and highlight key risks.")
LocalAgent is particularly effective in long-running multi-step pipelines where intermediate results accumulate. Compress each step’s output before passing it as context to the next ExecutionAgent.execute_step() call to stay within the model’s context window.