Overview
The/siaa/cache endpoint provides access to the LRU response cache, which stores frequently asked questions and their answers. Use GET to view statistics and DELETE to clear the cache.
Cache Design
SIAA implements an LRU (Least Recently Used) cache for document-based queries:- Max entries: 200 responses
- TTL: 3600 seconds (1 hour)
- Key normalization: Case-insensitive, accent-insensitive, punctuation-removed
- Thread-safe: Protected with locks for concurrent access
- Selective caching: Only document queries are cached (not conversational queries)
What Gets Cached
✅ Cached:- Document-based queries (“¿Cuándo reportar SIERJU?”)
- Questions containing judicial/technical terms
- Queries longer than 8 characters that aren’t conversational
- Conversational queries (“Hola”, “Gracias”)
- Negative responses (“No encontré esa información…”)
- Empty responses
- Queries that trigger clarification
Cache Key Normalization
Queries are normalized before hashing:- “¿Cuándo debo reportar?”
- “cuando debo reportar”
- “CUANDO DEBO REPORTAR”
- “Cuándo debo reportar.” (punctuation removed)
GET /siaa/cache
Retrieve cache statistics.Request
Response
Current number of cached responses (0-200)
Maximum cache capacity (200)
Total cache hits since server startup or last cache clear
Total cache misses since server startup or last cache clear
Cache hit rate as a percentage (e.g., “38.5%”)
Time-to-live for each cache entry in seconds (3600)
Example Response
Example Usage
DELETE /siaa/cache
Clear all cached responses.Request
Response
Always
true if the operation completed successfullyConfirmation message: “Caché limpiado correctamente”
Example Response
Example Usage
When to Clear Cache
After Document Updates
When source documents are modified, cached responses may contain outdated information:Manual Cache Clear (Without Document Reload)
If you only want to clear cached responses without reloading documents:Testing New Configurations
When testing changes to chunk selection, routing algorithms, or prompts:Debugging Response Quality
If users report incorrect answers, clear cache to eliminate stale responses:Cache Performance Metrics
Understanding Hit Rate
Hit rate = hits / (hits + misses)- > 30%: Excellent - Many users ask similar questions
- 20-30%: Good - Cache is effective
- 10-20%: Fair - Questions are diverse
- < 10%: Low - Consider increasing cache size or TTL
Expected Performance
For 26 judicial offices (despachos):- Without cache: ~44 seconds per query (LLM inference)
- With cache hit: ~5ms per query (8800x faster)
- Expected hit rate: 30-40% (based on common procedural questions)
Cache Size Utilization
Cache Hit Rate Over Time
LRU Eviction
When the cache reaches 200 entries:- New entry arrives: Cache is full
- Eviction: Least recently used entry is removed (front of OrderedDict)
- Insertion: New entry added to the back
- Cache hit → Entry moved to back (most recently used)
- Cache miss → No change to existing entries
Cache Validation (TTL)
Each cache entry has a timestamp. On lookup:- Entry found: Check
current_time - entry.timestamp > TTL - If expired: Delete entry, return cache miss
- If valid: Increment hit counter, move to back, return cached response
Monitoring Cache Health
Daily Stats Collection
Analyze Cache Trends
Alert on Low Hit Rate
Cache Data Structure
Internal structure (not exposed via API):Cache Headers in Chat Response
When a chat query hits the cache, the response includes:X-Cache: HIT header indicates the response was served from cache.
Examples
Example 1: Monitor Cache Growth
Example 2: Clear and Verify
Example 3: Cache Performance Report
Related Endpoints
- GET /siaa/status - Includes cache stats as part of overall system status
- GET /siaa/recargar - Automatically clears cache when reloading documents
- POST /siaa/chat - Uses cache for fast responses on repeated queries
- GET /siaa/log - Shows
CACHE_HITentries in quality logs