Overview
SIAA provides comprehensive monitoring capabilities through the/siaa/status endpoint and automatic health checking systems. This page covers all monitoring features and how to interpret system metrics.
Status Endpoint
The primary monitoring interface is the/siaa/status endpoint:
Response Format
Status Fields Reference
SIAA system version (e.g., “2.1.25”)
Overall system state:
"ok" or "error". Returns "error" if Ollama is unavailable.Whether Ollama AI service is currently available and responding.
Number of consecutive failed Ollama health checks. Resets to 0 on successful check.
Currently configured Ollama model identifier.
Whether the initial model warm-up completed successfully.
true: Model loaded in RAM, ready for fast inferencefalse: Warm-up failed (check Ollama logs)null: Warm-up not yet attempted
Number of currently active query sessions.
Total number of queries processed since server start (cumulative counter).
Total documents loaded across all collections.
Total pre-computed chunks across all documents.
Number of unique terms in the document density index.
Ollama Health Check System
Automatic Health Monitoring
SIAA runs a background thread that checks Ollama health every 15 seconds:Health Check Interval
The monitoring loop runs continuously:Manual Health Check
Trigger an immediate health check:Ollama Warm-up Monitoring
What is Warm-up?
When SIAA starts, it preloads the AI model into RAM to avoid first-query latency:Checking Warm-up Status
true— Model successfully loaded, ready for queriesfalse— Warm-up failed (check logs)null— Warm-up not yet attempted
Warm-up Console Output
Active Users Tracking
Real-Time User Count
SIAA tracks concurrent active queries:Implementation
Active user count is managed with thread-safe counters:Monitoring Load
- Low Load
- Medium Load
- High Load
usuarios_activos: 0-2Normal operation. Queries process quickly with minimal queuing.
Cache Statistics
Cache Metrics
The status endpoint includes detailed cache performance data:Cache Performance Indicators
Hit Rate 40%+ (Excellent)
Hit Rate 40%+ (Excellent)
Optimal performance. 40% or more queries served from cache, drastically reducing AI processing load.
Hit Rate 20-40% (Good)
Hit Rate 20-40% (Good)
Healthy cache utilization. Common queries are being cached effectively.
Hit Rate <20% (Review)
Hit Rate <20% (Review)
Low cache utilization. Consider:
- Increasing
CACHE_MAX_ENTRADAS - Increasing
CACHE_TTL_SEGUNDOS - Checking if queries are too diverse
Cache Saturation
Monitor cache capacity:entradas consistently equals max, the cache is full and using LRU eviction. Consider increasing CACHE_MAX_ENTRADAS.
Connection Status Indicators
Understanding Estado Field
Theestado field provides a quick health summary:
- “ok”: All systems operational (Ollama available)
- “error”: Critical failure (Ollama unavailable)
Failure Detection
When Ollama fails, the system responds gracefully:System Metrics Available
Document Processing Metrics
Collection Breakdown
Monitoring Best Practices
Regular Health Checks
Set up periodic monitoring with cron:Alerting on Failures
Monitorollama_fallos for sustained failures:
Dashboard Integration
Integrate with monitoring dashboards:Troubleshooting
Ollama Unavailable
Symptom:"ollama": false in status endpoint
Check:
Warm-up Failures
Symptom:"warmup_completado": false
Solutions:
- Check Ollama logs:
journalctl -u ollama -n 50 - Verify model exists:
ollama list - Increase warm-up timeout in code (currently 35s)
High Active Users with No Activity
Symptom:usuarios_activos stays high despite no queries
Cause: Possible exception preventing dec_activos() call
Solution: Check application logs for uncaught exceptions
Next Steps
Log Analysis
Analyze query performance and quality trends
Cache Management
Optimize cache for better performance