GET/DELETE /siaa/cache

Overview

The /siaa/cache endpoint provides access to the LRU response cache, which stores frequently asked questions and their answers. Use GET to view statistics and DELETE to clear the cache.

Cache Design

SIAA implements an LRU (Least Recently Used) cache for document-based queries:

Max entries: 200 responses
TTL: 3600 seconds (1 hour)
Key normalization: Case-insensitive, accent-insensitive, punctuation-removed
Thread-safe: Protected with locks for concurrent access
Selective caching: Only document queries are cached (not conversational queries)

What Gets Cached

✅ Cached:

Document-based queries (“¿Cuándo reportar SIERJU?”)
Questions containing judicial/technical terms
Queries longer than 8 characters that aren’t conversational

❌ NOT cached:

Conversational queries (“Hola”, “Gracias”)
Negative responses (“No encontré esa información…”)
Empty responses
Queries that trigger clarification

Cache Key Normalization

Queries are normalized before hashing:

Query: "¿Cuándo debo reportar?"
Normalized: "cuando debo reportar"
Hash: SHA256[:16]

This means these queries hit the same cache entry:

“¿Cuándo debo reportar?”
“cuando debo reportar”
“CUANDO DEBO REPORTAR”
“Cuándo debo reportar.” (punctuation removed)

GET /siaa/cache

Retrieve cache statistics.

Request

GET /siaa/cache

No parameters required.

Response

entradas

number

Current number of cached responses (0-200)

max

number

Maximum cache capacity (200)

hits

number

Total cache hits since server startup or last cache clear

misses

number

Total cache misses since server startup or last cache clear

hit_rate

string

Cache hit rate as a percentage (e.g., “38.5%”)

ttl_seg

number

Time-to-live for each cache entry in seconds (3600)

Example Response

{
  "entradas": 87,
  "max": 200,
  "hits": 245,
  "misses": 392,
  "hit_rate": "38.5%",
  "ttl_seg": 3600
}

Example Usage

curl http://localhost:5000/siaa/cache

DELETE /siaa/cache

Clear all cached responses.

Request

DELETE /siaa/cache

No parameters required.

Response

vaciado

boolean

Always true if the operation completed successfully

mensaje

string

Confirmation message: “Caché limpiado correctamente”

Example Response

{
  "vaciado": true,
  "mensaje": "Caché limpiado correctamente"
}

Example Usage

curl -X DELETE http://localhost:5000/siaa/cache

When to Clear Cache

After Document Updates

When source documents are modified, cached responses may contain outdated information:

# 1. Update document
vim /opt/siaa/fuentes/acuerdo_psaa16-10476.md

# 2. Reload documents and clear cache
curl http://localhost:5000/siaa/recargar

# Note: /siaa/recargar automatically clears cache
# Manual clear not needed unless reloading without document changes

Manual Cache Clear (Without Document Reload)

If you only want to clear cached responses without reloading documents:

curl -X DELETE http://localhost:5000/siaa/cache

Testing New Configurations

When testing changes to chunk selection, routing algorithms, or prompts:

# Clear cache to ensure queries use new logic
curl -X DELETE http://localhost:5000/siaa/cache

# Test query
curl -X POST http://localhost:5000/siaa/chat \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Test query"}]}'

Debugging Response Quality

If users report incorrect answers, clear cache to eliminate stale responses:

# Clear cache
curl -X DELETE http://localhost:5000/siaa/cache

# Reproduce user query
curl -X POST http://localhost:5000/siaa/chat \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"User query here"}]}'

Cache Performance Metrics

Understanding Hit Rate

Hit rate = hits / (hits + misses)

> 30%: Excellent - Many users ask similar questions
20-30%: Good - Cache is effective
10-20%: Fair - Questions are diverse
< 10%: Low - Consider increasing cache size or TTL

Expected Performance

For 26 judicial offices (despachos):

Without cache: ~44 seconds per query (LLM inference)
With cache hit: ~5ms per query (8800x faster)
Expected hit rate: 30-40% (based on common procedural questions)

Cache Size Utilization

# Check cache fullness
curl -s http://localhost:5000/siaa/cache | jq '{
  utilization: (.entradas / .max * 100 | tostring + "%"),
  entradas,
  max
}'

Output:

{
  "utilization": "43.5%",
  "entradas": 87,
  "max": 200
}

Cache Hit Rate Over Time

# Monitor hit rate every 10 seconds
watch -n 10 'curl -s http://localhost:5000/siaa/cache | jq .hit_rate'

LRU Eviction

When the cache reaches 200 entries:

New entry arrives: Cache is full
Eviction: Least recently used entry is removed (front of OrderedDict)
Insertion: New entry added to the back

Access pattern updates LRU order:

Cache hit → Entry moved to back (most recently used)
Cache miss → No change to existing entries

Cache Validation (TTL)

Each cache entry has a timestamp. On lookup:

Entry found: Check current_time - entry.timestamp > TTL
If expired: Delete entry, return cache miss
If valid: Increment hit counter, move to back, return cached response

This ensures responses are no older than 1 hour.

Monitoring Cache Health

Daily Stats Collection

#!/bin/bash
# Script: collect_cache_stats.sh

TIMESTAMP=$(date +%Y-%m-%d_%H:%M:%S)
STATS=$(curl -s http://localhost:5000/siaa/cache)

echo "$TIMESTAMP $STATS" >> /var/log/siaa/cache_stats.log

Run via cron every hour:

0 * * * * /opt/siaa/scripts/collect_cache_stats.sh

Analyze Cache Trends

# Extract hit rates from log
grep -oP '"hit_rate":"\K[0-9.]+' /var/log/siaa/cache_stats.log | tail -24

Alert on Low Hit Rate

#!/bin/bash
# Alert if hit rate drops below 15%

HIT_RATE=$(curl -s http://localhost:5000/siaa/cache | jq -r '.hit_rate' | tr -d '%')

if (( $(echo "$HIT_RATE < 15" | bc -l) )); then
  echo "ALERT: Cache hit rate is ${HIT_RATE}%" | \
    mail -s "SIAA Cache Performance Warning" [email protected]
fi

Cache Data Structure

Internal structure (not exposed via API):

_cache_respuestas = OrderedDict()  # LRU ordering

# Each entry:
{
  "<hash_key>": {
    "respuesta": "<full response text>",
    "cita": "<citation footer>",
    "ts": 1234567890.123,  # Unix timestamp
    "hits": 42  # Number of times this entry was reused
  }
}

Cache Headers in Chat Response

When a chat query hits the cache, the response includes:

HTTP/1.1 200 OK
Content-Type: text/event-stream
X-Cache: HIT

The X-Cache: HIT header indicates the response was served from cache.

Examples

Example 1: Monitor Cache Growth

# Initial state
curl -s http://localhost:5000/siaa/cache
# {"entradas": 0, "hits": 0, "misses": 0, ...}

# Ask a question
curl -X POST http://localhost:5000/siaa/chat \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"¿Cuándo reportar SIERJU?"}]}'

# Check cache (miss + insert)
curl -s http://localhost:5000/siaa/cache
# {"entradas": 1, "hits": 0, "misses": 1, "hit_rate": "0%"}

# Ask same question again
curl -X POST http://localhost:5000/siaa/chat \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"¿Cuándo reportar SIERJU?"}]}'

# Check cache (hit)
curl -s http://localhost:5000/siaa/cache
# {"entradas": 1, "hits": 1, "misses": 1, "hit_rate": "50.0%"}

Example 2: Clear and Verify

# Clear cache
curl -X DELETE http://localhost:5000/siaa/cache
# {"vaciado": true, "mensaje": "Caché limpiado correctamente"}

# Verify cleared
curl -s http://localhost:5000/siaa/cache
# {"entradas": 0, "hits": 0, "misses": 0, "hit_rate": "0%", ...}

Example 3: Cache Performance Report

curl -s http://localhost:5000/siaa/cache | jq '{
  performance: {
    hit_rate,
    total_queries: (.hits + .misses),
    cache_hits: .hits,
    cache_misses: .misses
  },
  capacity: {
    used: .entradas,
    available: (.max - .entradas),
    utilization: ((.entradas / .max * 100 | floor | tostring) + "%")
  },
  ttl_hours: (.ttl_seg / 3600)
}'

Output:

{
  "performance": {
    "hit_rate": "38.5%",
    "total_queries": 637,
    "cache_hits": 245,
    "cache_misses": 392
  },
  "capacity": {
    "used": 87,
    "available": 113,
    "utilization": "43%"
  },
  "ttl_hours": 1
}

GET /siaa/status - Includes cache stats as part of overall system status
GET /siaa/recargar - Automatically clears cache when reloading documents
POST /siaa/chat - Uses cache for fast responses on repeated queries
GET /siaa/log - Shows CACHE_HIT entries in quality logs

Endpoints

Utility Endpoints

GET/DELETE /siaa/cache

Overview

Cache Design

What Gets Cached

Cache Key Normalization

GET /siaa/cache

Request

Response

Example Response

Example Usage

DELETE /siaa/cache

Request

Response

Example Response

Example Usage

When to Clear Cache

After Document Updates

Manual Cache Clear (Without Document Reload)

Testing New Configurations

Debugging Response Quality

Cache Performance Metrics

Understanding Hit Rate

Expected Performance

Cache Size Utilization

Cache Hit Rate Over Time

LRU Eviction

Cache Validation (TTL)

Monitoring Cache Health

Daily Stats Collection

Analyze Cache Trends

Alert on Low Hit Rate

Cache Data Structure

Cache Headers in Chat Response

Examples

Example 1: Monitor Cache Growth

Example 2: Clear and Verify

Example 3: Cache Performance Report

Build docs developers (and LLMs) love

Endpoints

Utility Endpoints

​Overview

​Cache Design

​What Gets Cached

​Cache Key Normalization

​GET /siaa/cache

​Request

​Response

​Example Response

​Example Usage

​DELETE /siaa/cache

​Request

​Response

​Example Response

​Example Usage

​When to Clear Cache

​After Document Updates

​Manual Cache Clear (Without Document Reload)

​Testing New Configurations

​Debugging Response Quality

​Cache Performance Metrics

​Understanding Hit Rate

​Expected Performance

​Cache Size Utilization

​Cache Hit Rate Over Time

​LRU Eviction

​Cache Validation (TTL)

​Monitoring Cache Health

​Daily Stats Collection

​Analyze Cache Trends

​Alert on Low Hit Rate

​Cache Data Structure

​Cache Headers in Chat Response

​Examples

​Example 1: Monitor Cache Growth

​Example 2: Clear and Verify

​Example 3: Cache Performance Report

​Related Endpoints

Build docs developers (and LLMs) love

Overview

Cache Design

What Gets Cached

Cache Key Normalization

GET /siaa/cache

Request

Response

Example Response

Example Usage

DELETE /siaa/cache

Request

Response

Example Response

Example Usage

When to Clear Cache

After Document Updates

Manual Cache Clear (Without Document Reload)

Testing New Configurations

Debugging Response Quality

Cache Performance Metrics

Understanding Hit Rate

Expected Performance

Cache Size Utilization

Cache Hit Rate Over Time

LRU Eviction

Cache Validation (TTL)

Monitoring Cache Health

Daily Stats Collection

Analyze Cache Trends

Alert on Low Hit Rate

Cache Data Structure

Cache Headers in Chat Response

Examples

Example 1: Monitor Cache Growth

Example 2: Clear and Verify

Example 3: Cache Performance Report

Related Endpoints