GET /siaa/densidad/<termino>

Overview

The /siaa/densidad/<termino> endpoint returns a ranked list of documents containing a specific term, ordered by density (frequency relative to document length). This helps debug why certain documents are being selected for specific queries.

Endpoint

GET /siaa/densidad/<termino>

Parameters

termino

string

required

The term to search for in the density index. Must be lowercase and at least 3 characters.Example: sierju, periodicidad, 10476

Response

termino

string

The searched term (normalized to lowercase)

top_docs

array of objects

Top 10 documents containing this term, ordered by density (highest first)

Show properties

doc

string

Document filename

densidad

number

Term density score (frequency / total_tokens in document). Higher values indicate the term appears more frequently relative to document length.

total_docs

integer

Total number of documents containing this term

Error Response

If the term is not found in the density index:

error

string

Error message indicating the term is not in the index

Status code: 404

Example

Request

curl http://localhost:5000/siaa/densidad/sierju

Response

{
  "termino": "sierju",
  "top_docs": [
    {
      "doc": "acuerdo_no._psaa16-10476.md",
      "densidad": 0.042156
    },
    {
      "doc": "guia_sierju_despachos.md",
      "densidad": 0.038947
    },
    {
      "doc": "acuerdo_pcsja19-11207.md",
      "densidad": 0.021338
    },
    {
      "doc": "instructivo_periodicidad.md",
      "densidad": 0.018562
    }
  ],
  "total_docs": 4
}

Use Cases

Understanding Query Routing

When debugging why a particular document was selected for a query, check the density of key terms:

curl http://localhost:5000/siaa/densidad/periodicidad

Documents with higher density scores for query terms are more likely to be selected by the routing algorithm.

Validating Alphanumeric Term Extraction

Verify that alphanumeric codes are correctly indexed:

curl http://localhost:5000/siaa/densidad/psaa16

If the term returns a 404, it means the tokenizer filtered it out (check the tokenization rules).

Finding Document Distribution

Determine which documents discuss a specific topic:

curl http://localhost:5000/siaa/densidad/sancion

Use total_docs to see how many documents contain the term across the entire corpus.

Testing Numeric Code Indexing

Confirm that important numeric codes (4+ digits) are indexed:

curl http://localhost:5000/siaa/densidad/10476

This should return documents containing the decree number “10476”.

Notes

Density is calculated as: term_frequency / total_tokens_in_document
Only terms with 3+ characters are indexed
Stopwords are filtered out during indexing
Pure numeric terms are only indexed if they have 4+ digits (e.g., “2016”, “10476”)
Alphanumeric terms with letters are always indexed (e.g., “psaa16”, “pcsja19”)
The density index is built during document loading and cached in memory

Endpoints

Utility Endpoints

GET /siaa/densidad/<termino>

Overview

Endpoint

Parameters

Response

Error Response

Example

Request

Response

Use Cases

Understanding Query Routing

Validating Alphanumeric Term Extraction

Finding Document Distribution

Testing Numeric Code Indexing

Notes

Build docs developers (and LLMs) love

Endpoints

Utility Endpoints

​Overview

​Endpoint

​Parameters

​Response

​Error Response

​Example

​Request

​Response

​Use Cases

​Understanding Query Routing

​Validating Alphanumeric Term Extraction

​Finding Document Distribution

​Testing Numeric Code Indexing

​Notes

Build docs developers (and LLMs) love

Overview

Endpoint

Parameters

Response

Error Response

Example

Request

Response

Use Cases

Understanding Query Routing

Validating Alphanumeric Term Extraction

Finding Document Distribution

Testing Numeric Code Indexing

Notes