Skip to main content
SQLiteShortTermMemory provides two things: a conversation store keyed by session ID, and a global cache that skips all LLM calls when an exact-match answer already exists in the database. The cache is checked at the very start of main.py — before any agent is created — so a cache hit costs nothing beyond a single SQLite query.

Initializing

from memory.short_term_memory import SQLiteShortTermMemory

# Persists to disk (default path: short_term_memory.db)
memory = SQLiteShortTermMemory()

# Ephemeral storage — data lives only for the duration of the process
memory = SQLiteShortTermMemory(db_path=':memory:')
SQLiteShortTermMemory.__init__ creates the database and schema automatically on first call. A session-scoped index on session_id is created for fast lookups.
Use db_path=':memory:' during tests or when you do not want to persist state between runs.

Caching pattern

This is the exact pattern from main.py. Check the cache before doing any expensive work:
task = "What is the capital of Andorra?"

# 1. Check cache
cached_response = memory.get_exact_match_answer(task)
if cached_response:
    print(f"[CACHE HIT] Found previous answer for task: '{task}'")
    print(f"Cached Answer:\n{cached_response}")
    # Exit early — no LLM calls needed
else:
    print(f"[CACHE MISS] No previous answer found for: '{task}'")
    # ... run orchestrator ...
get_exact_match_answer looks for a row in the memory table where:
  • role = 'user' and content exactly matches the query string
  • The immediately following row in the same session has role = 'assistant'
It returns the assistant content string, or None if no match exists.

Saving to cache after a successful run

After the orchestrator completes successfully, store the question and aggregated answer under the global cache session:
if orchestrator_result['status'] == 'success':
    session = "global_cache_session"

    # Store the user query
    memory.add_memory(session_id=session, role="user", content=task)

    # Combine all validated step results into a single answer string
    final_answer = "\n".join([
        f"Step: {r['step']}\nResult: {r['result']}"
        for r in orchestrator_result.get('results', [])
        if r.get('status') == 'validated'
    ])

    # Store the answer
    memory.add_memory(session_id=session, role="assistant", content=final_answer)
    print("[CACHE] Saved to SQLite ShortTermMemory.")
The next time the same task string is passed to get_exact_match_answer, it will return final_answer immediately.

Session ID usage

Every call to add_memory and get_context takes a session_id string. This scopes memory entries so different workflows, users, or test runs do not see each other’s data.
Session IDTypical use
"global_cache_session"Cross-run answer cache (the pattern in main.py)
"user-abc-run-1"Isolate a single user’s conversation history
"test-session"Unit tests; combine with db_path=':memory:'

Storing and retrieving conversation history

1

Add messages to a session

session_id = "my-session-001"

memory.add_memory(session_id=session_id, role="user",      content="Who founded Python?")
memory.add_memory(session_id=session_id, role="assistant", content="Guido van Rossum.")
memory.add_memory(session_id=session_id, role="user",      content="When was it first released?")
memory.add_memory(session_id=session_id, role="assistant", content="Python 0.9.0 was released in February 1991.")
Optionally attach a metadata dictionary to any message:
memory.add_memory(
    session_id=session_id,
    role="tool",
    content="Search result: ...",
    metadata={"tool_name": "CurlSearchTool", "tokens_used": 142},
)
2

Retrieve recent context

get_context returns up to limit messages in chronological order (oldest first within the window):
history = memory.get_context(session_id=session_id, limit=10)
# [
#   {"role": "user",      "content": "Who founded Python?", "metadata": None, "timestamp": "..."},
#   {"role": "assistant", "content": "Guido van Rossum.",   "metadata": None, "timestamp": "..."},
#   ...
# ]
3

Format context for LLM injection

format_as_string wraps get_context and returns a plain-text block ready to paste into a prompt:
context_block = memory.format_as_string(session_id=session_id, limit=10)
print(context_block)
# USER: Who founded Python?
#
# ASSISTANT: Guido van Rossum.
#
# USER: When was it first released?
#
# ASSISTANT: Python 0.9.0 was released in February 1991.
4

Clear a session

Delete all messages for a session (e.g., at the end of a test or when a user logs out):
memory.clear_session(session_id=session_id)

Performance benefit

On a cache hit, the entire orchestration pipeline — planner, executor (with LLM calls, possible tool use, and retries), and monitor — is bypassed. The only work done is a single indexed SQLite SELECT join. This is especially valuable for:
  • Development iteration where you run the same task repeatedly
  • Production systems where identical user queries are common
  • Testing downstream logic without incurring LLM API costs
The cache key is the exact task string. Even a single character difference is a cache miss. Normalise whitespace and casing in your task variable before calling get_exact_match_answer if you want fuzzy deduplication.

Build docs developers (and LLMs) love