Overview
The cache utilities module provides two caching mechanisms:- LRUCache: In-memory thread-safe LRU cache for embeddings and temporary data
- ExtractionSidecarCache: Persistent JSON sidecar cache for entity extraction results
src/utils/cache_utils.py, src/utils/extraction_cache.py
LRUCache
Overview
Thread-safe LRU (Least Recently Used) cache backed by anOrderedDict. Evicts the least-recently-used entry when max_items is reached.
Thread Safety: All public methods are serialized with a lock, making the cache safe for concurrent access from extraction workers.
Source: src/utils/cache_utils.py:13-70
Constructor
Maximum number of items to store. When exceeded, the least-recently-used item is evicted.
Minimum value is 1.
Methods
get
None on miss. Promotes key to most-recently-used on hit.
set
key. Evicts LRU entry if over capacity.
clear
stats
Example: Embedding Cache
Hashing Utilities
sha256_text
Text to hash.
64-character hexadecimal SHA-256 digest.
sha256_jsonable
str(obj) so this never raises.
JSON-serializable object (dict, list, etc.).
64-character hexadecimal SHA-256 digest.
Example: Schema Hashing
ExtractionSidecarCache
Overview
Persistent JSON sidecar cache for entity extraction results. Stores extraction outputs keyed on all output-affecting inputs:- Content hash (article text)
- Model name
- Entity type
- Prompt hash
- Schema hash
- Temperature
cache.extraction.version in the domain config causes reads from the old vN/ directory to stop matching, effectively invalidating the entire cache without deleting files.
Source: src/utils/extraction_cache.py
Constructor
Base directory for cache files (typically the domain data directory).
Subdirectory within
base_dir for extraction cache.Cache version. Changing this invalidates all existing cache entries.
Typically loaded from
cache.extraction.version in domain config.Whether caching is enabled. If
False, read() always returns None and write() is a no-op.Methods
make_key
Article text being extracted from.
System prompt used for extraction.
Pydantic model or
List[Model] defining the extraction schema.LLM model identifier (e.g.,
"gemini/gemini-2.0-flash-exp").Entity type:
"people", "organizations", "locations", or "events".Model temperature (affects output randomness).
64-character hexadecimal cache key.
read
key, or None on miss.
Cache key from
make_key().Cached extraction record (see build_cache_record) or
None if not found.write
record as JSON for key.
Atomicity: Writes to a temp file in the same directory, then uses os.replace() (atomic on POSIX). Concurrent reads are safe — readers see either the old file or the new one, never a partial write.
Cache key from
make_key().Cache record to write (see build_cache_record).
enabled
stats
Helper Functions
build_cache_record
Extraction output (typically
List[Dict[str, Any]] or list of Pydantic models).Entity type being extracted.
LLM model identifier.
Model temperature.
SHA-256 hash of article text.
SHA-256 hash of system prompt.
SHA-256 hash of response model schema.
Cache version number.
Cache record with metadata:
Complete Example
Configuration
Domain Config
Cache settings are typically loaded from domain YAML:configs/guantanamo/config.yaml
Accessing Config
Performance Considerations
LRU Cache Sizing
LRU Cache Sizing
Embedding Cache:
- Each embedding vector is ~4KB (1024 dimensions × 4 bytes/float)
- Default 4096 items ≈ 16MB memory
- Increase for domains with many unique entity names
- Decrease for memory-constrained environments
- Monitor
cache.stats['hit_rate'] - Good hit rate: >50% for typical workloads
- Low hit rate indicates
max_itemsis too small or text variety is very high
Extraction Cache Sharding
Extraction Cache Sharding
Why Shard:
- Some filesystems struggle with >10K files in a single directory
- 2-level sharding (
{key[:2]}/{key[2:4]}/) limits to ~256 files/dir
- Each record is 1-10KB (depends on entity count)
- 10K cached articles ≈ 10-100MB disk space
- No automatic eviction — old versions accumulate in
v1/,v2/, etc.
Thread Safety
Thread Safety
LRUCache:
- All methods use a single lock (
threading.Lock) - Safe for concurrent access from worker threads
- Lock contention is minimal for typical workloads
- Writes use
os.replace()(atomic on POSIX) - Concurrent reads of the same key are safe
- Concurrent writes to the same key are serialized by filesystem
Cache Invalidation
Cache Invalidation
Extraction Cache:
- Bump
cache.extraction.versionin domain config - Old
vN/directories can be deleted manually after version bump - Cache keys include all output-affecting inputs (prompt, schema, model, temperature)
- Keyed by
(fingerprint, text_hash)where fingerprint is"{model}:{dimension}" - Changing model or dimension automatically invalidates old entries
- No manual invalidation needed
See Also
- Embeddings - Uses LRUCache for embedding vectors
- Extractors - Uses ExtractionSidecarCache to avoid redundant LLM calls
- Quality Controls - Validation applied to cached extraction results