Overview
The/siaa/densidad/<termino> endpoint returns a ranked list of documents containing a specific term, ordered by density (frequency relative to document length). This helps debug why certain documents are being selected for specific queries.
Endpoint
Parameters
The term to search for in the density index. Must be lowercase and at least 3 characters.Example:
sierju, periodicidad, 10476Response
The searched term (normalized to lowercase)
Top 10 documents containing this term, ordered by density (highest first)
Total number of documents containing this term
Error Response
If the term is not found in the density index:Error message indicating the term is not in the index
404
Example
Request
Response
Use Cases
Understanding Query Routing
When debugging why a particular document was selected for a query, check the density of key terms:Validating Alphanumeric Term Extraction
Verify that alphanumeric codes are correctly indexed:Finding Document Distribution
Determine which documents discuss a specific topic:total_docs to see how many documents contain the term across the entire corpus.
Testing Numeric Code Indexing
Confirm that important numeric codes (4+ digits) are indexed:Notes
- Density is calculated as:
term_frequency / total_tokens_in_document - Only terms with 3+ characters are indexed
- Stopwords are filtered out during indexing
- Pure numeric terms are only indexed if they have 4+ digits (e.g., “2016”, “10476”)
- Alphanumeric terms with letters are always indexed (e.g., “psaa16”, “pcsja19”)
- The density index is built during document loading and cached in memory