Memory and caching

SQLiteShortTermMemory provides two things: a conversation store keyed by session ID, and a global cache that skips all LLM calls when an exact-match answer already exists in the database. The cache is checked at the very start of main.py — before any agent is created — so a cache hit costs nothing beyond a single SQLite query.

Initializing

from memory.short_term_memory import SQLiteShortTermMemory

# Persists to disk (default path: short_term_memory.db)
memory = SQLiteShortTermMemory()

# Ephemeral storage — data lives only for the duration of the process
memory = SQLiteShortTermMemory(db_path=':memory:')

SQLiteShortTermMemory.__init__ creates the database and schema automatically on first call. A session-scoped index on session_id is created for fast lookups.

Use db_path=':memory:' during tests or when you do not want to persist state between runs.

Caching pattern

This is the exact pattern from main.py. Check the cache before doing any expensive work:

task = "What is the capital of Andorra?"

# 1. Check cache
cached_response = memory.get_exact_match_answer(task)
if cached_response:
    print(f"[CACHE HIT] Found previous answer for task: '{task}'")
    print(f"Cached Answer:\n{cached_response}")
    # Exit early — no LLM calls needed
else:
    print(f"[CACHE MISS] No previous answer found for: '{task}'")
    # ... run orchestrator ...

get_exact_match_answer looks for a row in the memory table where:

role = 'user' and content exactly matches the query string
The immediately following row in the same session has role = 'assistant'

It returns the assistant content string, or None if no match exists.

Saving to cache after a successful run

After the orchestrator completes successfully, store the question and aggregated answer under the global cache session:

if orchestrator_result['status'] == 'success':
    session = "global_cache_session"

    # Store the user query
    memory.add_memory(session_id=session, role="user", content=task)

    # Combine all validated step results into a single answer string
    final_answer = "\n".join([
        f"Step: {r['step']}\nResult: {r['result']}"
        for r in orchestrator_result.get('results', [])
        if r.get('status') == 'validated'
    ])

    # Store the answer
    memory.add_memory(session_id=session, role="assistant", content=final_answer)
    print("[CACHE] Saved to SQLite ShortTermMemory.")

The next time the same task string is passed to get_exact_match_answer, it will return final_answer immediately.

Session ID usage

Every call to add_memory and get_context takes a session_id string. This scopes memory entries so different workflows, users, or test runs do not see each other’s data.

Session ID	Typical use
`"global_cache_session"`	Cross-run answer cache (the pattern in `main.py`)
`"user-abc-run-1"`	Isolate a single user’s conversation history
`"test-session"`	Unit tests; combine with `db_path=':memory:'`

Storing and retrieving conversation history

Add messages to a session

session_id = "my-session-001"

memory.add_memory(session_id=session_id, role="user",      content="Who founded Python?")
memory.add_memory(session_id=session_id, role="assistant", content="Guido van Rossum.")
memory.add_memory(session_id=session_id, role="user",      content="When was it first released?")
memory.add_memory(session_id=session_id, role="assistant", content="Python 0.9.0 was released in February 1991.")

Optionally attach a metadata dictionary to any message:

memory.add_memory(
    session_id=session_id,
    role="tool",
    content="Search result: ...",
    metadata={"tool_name": "CurlSearchTool", "tokens_used": 142},
)

Retrieve recent context

get_context returns up to limit messages in chronological order (oldest first within the window):

history = memory.get_context(session_id=session_id, limit=10)
# [
#   {"role": "user",      "content": "Who founded Python?", "metadata": None, "timestamp": "..."},
#   {"role": "assistant", "content": "Guido van Rossum.",   "metadata": None, "timestamp": "..."},
#   ...
# ]

Format context for LLM injection

format_as_string wraps get_context and returns a plain-text block ready to paste into a prompt:

context_block = memory.format_as_string(session_id=session_id, limit=10)
print(context_block)
# USER: Who founded Python?
#
# ASSISTANT: Guido van Rossum.
#
# USER: When was it first released?
#
# ASSISTANT: Python 0.9.0 was released in February 1991.

Clear a session

Delete all messages for a session (e.g., at the end of a test or when a user logs out):

memory.clear_session(session_id=session_id)

Performance benefit

On a cache hit, the entire orchestration pipeline — planner, executor (with LLM calls, possible tool use, and retries), and monitor — is bypassed. The only work done is a single indexed SQLite SELECT join. This is especially valuable for:

Development iteration where you run the same task repeatedly
Production systems where identical user queries are common
Testing downstream logic without incurring LLM API costs

The cache key is the exact task string. Even a single character difference is a cache miss. Normalise whitespace and casing in your task variable before calling get_exact_match_answer if you want fuzzy deduplication.

Get Started

Core Concepts

Guides

Initializing

Caching pattern

Saving to cache after a successful run

Session ID usage

Storing and retrieving conversation history

Performance benefit

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Initializing

​Caching pattern

​Saving to cache after a successful run

​Session ID usage

​Storing and retrieving conversation history

​Performance benefit

Build docs developers (and LLMs) love

Initializing

Caching pattern

Saving to cache after a successful run

Session ID usage

Storing and retrieving conversation history

Performance benefit