Skip to main content
SQLiteShortTermMemory provides a persistent, session-aware message store backed by SQLite. It supports context retrieval for LLM injection, exact-match caching to avoid redundant LLM calls, and full session cleanup.

Schema

The database contains a single memory table:
CREATE TABLE IF NOT EXISTS memory (
    id         INTEGER PRIMARY KEY AUTOINCREMENT,
    session_id TEXT    NOT NULL,
    role       TEXT    NOT NULL,
    content    TEXT    NOT NULL,
    metadata   TEXT,
    timestamp  DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_session_id ON memory(session_id);
Sessions are isolated by session_id. All queries filter by session_id, so different sessions never see each other’s messages.

Initialization

from memory.short_term_memory import SQLiteShortTermMemory

# Persistent storage
memory = SQLiteShortTermMemory(db_path="short_term_memory.db")

# Ephemeral in-memory storage (no file written to disk)
memory = SQLiteShortTermMemory(db_path=":memory:")
Use db_path=':memory:' in tests or when you do not want state to persist between process restarts.

Storing messages

memory.add_memory(
    session_id="session-abc",
    role="user",
    content="What is the capital of Andorra?"
)

memory.add_memory(
    session_id="session-abc",
    role="assistant",
    content="The capital of Andorra is Andorra la Vella.",
    metadata={"model": "llama3", "tokens": 42}
)
role is a free string — common values are "user", "assistant", "system", and "tool". metadata is an optional dict stored as JSON.

Retrieving context

As structured data

messages = memory.get_context(session_id="session-abc", limit=10)
# [
#   {"role": "user",      "content": "...", "metadata": None, "timestamp": "..."},
#   {"role": "assistant", "content": "...", "metadata": {...}, "timestamp": "..."}
# ]
get_context() fetches the most recent limit messages (ordered by timestamp DESC, id DESC), then reverses them to return oldest-first chronological order.

As a formatted string for LLM injection

context_str = memory.format_as_string(session_id="session-abc", limit=10)
# "USER: What is the capital of Andorra?\n\nASSISTANT: The capital of Andorra is Andorra la Vella."
Each message is formatted as ROLE: content (role uppercased) with \n\n between messages. This string can be passed directly as context to any agent’s invoke() or execute_step() call.

Clearing a session

memory.clear_session(session_id="session-abc")
# Deletes all rows where session_id = 'session-abc'

Exact-match caching

get_exact_match_answer(query) looks for a previous user message that exactly matches query and returns the immediately following assistant message. This acts as a simple response cache.
cached = memory.get_exact_match_answer("What is the capital of Andorra?")
if cached:
    print(f"Cache hit: {cached}")
The SQL query joins the table on itself:
SELECT m2.content
FROM memory m1
JOIN memory m2
  ON m1.session_id = m2.session_id AND m2.id > m1.id
WHERE m1.role = 'user'
  AND m1.content = ?
  AND m2.role = 'assistant'
ORDER BY m2.id ASC
LIMIT 1
The match is exact and global — it searches across all sessions. This is suitable for deterministic queries (e.g., factual lookups) where the same question should always return the same cached answer.

Caching pattern from main.py

The main.py entry point demonstrates the full cache-check-before-LLM pattern:
from memory.short_term_memory import SQLiteShortTermMemory

memory = SQLiteShortTermMemory()
task = "What is the capital of Andorra?"

# 1. Check cache before invoking any LLM
cached_response = memory.get_exact_match_answer(task)
if cached_response:
    print(f"[CACHE HIT] {cached_response}")
    # Exit early — no LLM call needed
else:
    # 2. Run the full orchestrator pipeline
    result = orchestrator.run(task=task)

    # 3. On success, save to cache for future lookups
    if result["status"] == "success":
        session = "global_cache_session"
        memory.add_memory(session_id=session, role="user",      content=task)
        memory.add_memory(session_id=session, role="assistant", content=final_answer)
Use a dedicated session_id (e.g., "global_cache_session") for cached responses, separate from interactive conversation sessions. This prevents cache entries from appearing in session-scoped get_context() calls.

API reference

MethodDescription
add_memory(session_id, role, content, metadata=None)Store a message in memory.
get_context(session_id, limit=10)Retrieve the most recent limit messages in chronological order.
format_as_string(session_id, limit=10)Format context as ROLE: content lines for LLM injection.
get_exact_match_answer(query)Find exact user query in history and return the following assistant response.
clear_session(session_id)Delete all messages for a session.

Build docs developers (and LLMs) love