SQLiteShortTermMemory provides two things: a conversation store keyed by session ID, and a global cache that skips all LLM calls when an exact-match answer already exists in the database.
The cache is checked at the very start of main.py — before any agent is created — so a cache hit costs nothing beyond a single SQLite query.
Initializing
from memory.short_term_memory import SQLiteShortTermMemory
# Persists to disk (default path: short_term_memory.db)
memory = SQLiteShortTermMemory()
# Ephemeral storage — data lives only for the duration of the process
memory = SQLiteShortTermMemory(db_path=':memory:')
SQLiteShortTermMemory.__init__ creates the database and schema automatically on first call. A session-scoped index on session_id is created for fast lookups.
Use db_path=':memory:' during tests or when you do not want to persist state between runs.
Caching pattern
This is the exact pattern from main.py. Check the cache before doing any expensive work:
task = "What is the capital of Andorra?"
# 1. Check cache
cached_response = memory.get_exact_match_answer(task)
if cached_response:
print(f"[CACHE HIT] Found previous answer for task: '{task}'")
print(f"Cached Answer:\n{cached_response}")
# Exit early — no LLM calls needed
else:
print(f"[CACHE MISS] No previous answer found for: '{task}'")
# ... run orchestrator ...
get_exact_match_answer looks for a row in the memory table where:
role = 'user' and content exactly matches the query string
- The immediately following row in the same session has
role = 'assistant'
It returns the assistant content string, or None if no match exists.
Saving to cache after a successful run
After the orchestrator completes successfully, store the question and aggregated answer under the global cache session:
if orchestrator_result['status'] == 'success':
session = "global_cache_session"
# Store the user query
memory.add_memory(session_id=session, role="user", content=task)
# Combine all validated step results into a single answer string
final_answer = "\n".join([
f"Step: {r['step']}\nResult: {r['result']}"
for r in orchestrator_result.get('results', [])
if r.get('status') == 'validated'
])
# Store the answer
memory.add_memory(session_id=session, role="assistant", content=final_answer)
print("[CACHE] Saved to SQLite ShortTermMemory.")
The next time the same task string is passed to get_exact_match_answer, it will return final_answer immediately.
Session ID usage
Every call to add_memory and get_context takes a session_id string. This scopes memory entries so different workflows, users, or test runs do not see each other’s data.
| Session ID | Typical use |
|---|
"global_cache_session" | Cross-run answer cache (the pattern in main.py) |
"user-abc-run-1" | Isolate a single user’s conversation history |
"test-session" | Unit tests; combine with db_path=':memory:' |
Storing and retrieving conversation history
Add messages to a session
session_id = "my-session-001"
memory.add_memory(session_id=session_id, role="user", content="Who founded Python?")
memory.add_memory(session_id=session_id, role="assistant", content="Guido van Rossum.")
memory.add_memory(session_id=session_id, role="user", content="When was it first released?")
memory.add_memory(session_id=session_id, role="assistant", content="Python 0.9.0 was released in February 1991.")
Optionally attach a metadata dictionary to any message:memory.add_memory(
session_id=session_id,
role="tool",
content="Search result: ...",
metadata={"tool_name": "CurlSearchTool", "tokens_used": 142},
)
Retrieve recent context
get_context returns up to limit messages in chronological order (oldest first within the window):history = memory.get_context(session_id=session_id, limit=10)
# [
# {"role": "user", "content": "Who founded Python?", "metadata": None, "timestamp": "..."},
# {"role": "assistant", "content": "Guido van Rossum.", "metadata": None, "timestamp": "..."},
# ...
# ]
Format context for LLM injection
format_as_string wraps get_context and returns a plain-text block ready to paste into a prompt:context_block = memory.format_as_string(session_id=session_id, limit=10)
print(context_block)
# USER: Who founded Python?
#
# ASSISTANT: Guido van Rossum.
#
# USER: When was it first released?
#
# ASSISTANT: Python 0.9.0 was released in February 1991.
Clear a session
Delete all messages for a session (e.g., at the end of a test or when a user logs out):memory.clear_session(session_id=session_id)
On a cache hit, the entire orchestration pipeline — planner, executor (with LLM calls, possible tool use, and retries), and monitor — is bypassed. The only work done is a single indexed SQLite SELECT join.
This is especially valuable for:
- Development iteration where you run the same task repeatedly
- Production systems where identical user queries are common
- Testing downstream logic without incurring LLM API costs
The cache key is the exact task string. Even a single character difference is a cache miss. Normalise whitespace and casing in your task variable before calling get_exact_match_answer if you want fuzzy deduplication.