Overview
The/siaa/recargar endpoint reloads all documents from the source directory (/opt/siaa/fuentes/), recalculates TF-IDF indexes, regenerates chunks, and clears the response cache. Use this endpoint after adding, updating, or removing documents.
Endpoint
Request
No parameters required.Response
Always
true if the operation completed successfullyObject mapping collection names to document counts:
Total number of documents loaded across all collections
Total number of pre-computed chunks across all documents
Example Response
What Gets Reloaded
The reload operation performs the following steps:1. Document Scanning
- Scans
/opt/siaa/fuentes/and all subdirectories - Loads all
.mdand.txtfiles - Reads file contents with UTF-8 encoding (ignoring errors)
2. Tokenization
- Extracts alphanumeric tokens (3+ characters)
- Includes mixed alphanumeric terms (e.g., “psaa16”, “art5”)
- Includes numbers with 4+ digits (e.g., “10476”, “2016”)
- Removes Spanish stopwords
3. TF-IDF Calculation
- Calculates term frequency (TF) for each document
- Calculates inverse document frequency (IDF) across collection
- Generates top 20 keywords per document
- Combines auto-generated keywords with manual keyword mappings
4. Chunk Generation
- Splits documents into fixed-size chunks (800 characters)
- Applies overlap between chunks (300 characters)
- Preserves section context (last heading before chunk)
- Pre-computes all chunks for fast retrieval
5. Density Index
- Creates inverted index: term → [(density, document), …]
- Density = term_frequency / total_tokens_in_document
- Sorted by density descending for fast document routing
6. Cache Invalidation
- Clears all cached responses (200 entries max)
- Reason: Cached answers may reference old document content
- New queries will regenerate cache based on updated documents
When to Use
After Adding New Documents
After Updating Existing Documents
After Removing Documents
After Server Restart
Reload is NOT needed after server restart. Documents are automatically loaded on startup. However, you may want to reload if:- Documents were updated while server was down
- You want to force cache clearing
Performance Impact
Reload Time
For typical document sets:- 5-10 documents: ~1-2 seconds
- 20-50 documents: ~3-5 seconds
- 100+ documents: ~10-15 seconds
- Document size (larger files take longer to tokenize)
- Number of documents (TF-IDF is O(n²) worst case)
- Disk I/O speed
System Impact During Reload
- Chat queries continue to work using old index
- Once reload completes, new index atomically replaces old one
- Brief lock contention possible during index swap (~1ms)
- All active streaming responses complete unaffected
Cache Impact
- All cached responses are cleared
- First query after reload will be slower (cache miss)
- Cache rebuilds naturally as queries arrive
- Typical hit rate recovers within 30-60 minutes
Error Handling
The endpoint does not return errors even if:- Some documents fail to parse (they’re skipped)
- Directory doesn’t exist (it’s created)
- No valid documents found (returns empty collections)
[Doc], [KW], [IDX].
Example Workflow: Adding New Regulatory Document
Monitoring Reload Operations
Check Current Document Count
Compare Before and After
Verify Specific Document Loaded
Automation
Cron Job: Daily Reload
Reload index daily at 3 AM to pick up any document updates:File Watcher: Auto-reload on Change
Useinotifywait to trigger reload when documents change:
Related Endpoints
- GET /siaa/status - Check current document count and collections
- GET /siaa/cache - View cache statistics (will be empty after reload)
- DELETE /siaa/cache - Clear cache without reloading documents
- GET /siaa/ver/<nombre_doc> - View individual documents