POST /siaa/chat

Overview

The /siaa/chat endpoint is the primary interface for interacting with SIAA. It processes both conversational queries (greetings, general questions) and document-based queries (procedural, regulatory questions), returning responses via Server-Sent Events (SSE) streaming.

Endpoint

POST /siaa/chat

Request

Headers

Content-Type

string

required

Must be application/json

Body Parameters

messages

array

required

Array of message objects representing the conversation history. Each message object contains:

role

string

required

Role of the message sender. Either "user" or "assistant"

content

string

required

The message content

Example Request Body

{
  "messages": [
    {
      "role": "user",
      "content": "¿Cuándo debo reportar en SIERJU?"
    }
  ]
}

Response

The endpoint returns a Server-Sent Events (SSE) stream with Content-Type: text/event-stream.

Response Headers

Content-Type

string

text/event-stream

Cache-Control

string

no-cache

X-Accel-Buffering

string

no (prevents proxy buffering)

X-Cache

string

Present with value HIT when the response is served from cache

Stream Format

Each chunk in the stream follows this format:

data: {"choices":[{"delta":{"content":"<token>"}}]}

The stream ends with:

data: [DONE]

Example Response Stream

data: {"choices":[{"delta":{"content":"El"}}]}

data: {"choices":[{"delta":{"content":" reporte"}}]}

data: {"choices":[{"delta":{"content":" debe"}}]}

data: {"choices":[{"delta":{"content":" realizarse"}}]}

data: {"choices":[{"delta":{"content":" el"}}]}

data: {"choices":[{"delta":{"content":" quinto"}}]}

data: {"choices":[{"delta":{"content":" día"}}]}

data: {"choices":[{"delta":{"content":" hábil"}}]}

data: {"choices":[{"delta":{"content":".\n\n📄 **Fuente:** ACUERDO_NO._PSAA16-10476"}}]}

data: [DONE]

Query Types

Conversational Queries

Short phrases like greetings, thanks, or questions about SIAA itself are handled conversationally without document search:

“Hola”, “Buenos días”
“¿Qué es SIAA?”
“Gracias”, “Adiós”

Document Queries

Questions about judicial procedures, regulations, or administrative processes trigger document retrieval and RAG-based responses:

Questions containing judicial/technical terms (SIERJU, PSAA, acuerdo, juzgado, etc.)
Questions longer than 8 characters that aren’t conversational

Clarification Responses

When a query is ambiguous (e.g., “juzgado civil” without specifying municipal or circuito), SIAA responds with clarification options:

{
  "pregunta_clarificacion": "¿A qué tipo de Juzgado Civil se refiere su consulta?",
  "opciones": [
    "Juzgado Civil Municipal",
    "Juzgado Civil del Circuito",
    "Juzgado Civil del Circuito Especializado",
    "Juzgado Civil de Ejecución de Sentencias",
    "Juzgado de Familia"
  ]
}

Cache Behavior

Cache Hit: If the same question has been asked recently (within 1 hour), the cached response is returned with header X-Cache: HIT. Cache keys are normalized (case-insensitive, accent-insensitive, punctuation-removed). Cache Miss: New queries trigger:

Document routing (TF-IDF + density + filename matching)
Chunk extraction (sliding window with overlap)
LLM inference (Ollama)
Cache storage (if successful)

Cache Invalidation:

TTL: 3600 seconds (1 hour)
LRU eviction when cache reaches 200 entries
Cleared on document reload
Negative responses (“no encontré”) are NOT cached

Error Responses

Error messages are streamed as SSE events:

COLA_LLENA

string

⏳ Sistema ocupado. Intente en 30 segundos. - Max concurrent requests reached

TIMEOUT_CONEXION

string

⚠ IA no responde. Intente de nuevo. - Ollama connection timeout

TIMEOUT_RESPUESTA

string

⏱ Consulta tomó demasiado tiempo. - Response generation timeout (180s)

OLLAMA_CAIDO

string

⚠ Servidor IA reiniciándose. Espere 1 minuto. - Ollama service unavailable

Examples

Example 1: Document Query

curl -X POST http://localhost:5000/siaa/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "¿Cuál es la periodicidad del reporte SIERJU?"
      }
    ]
  }'

Response (SSE stream):

data: {"choices":[{"delta":{"content":"La periodicidad del reporte SIERJU es mensual. Debe presentarse el quinto día hábil de cada mes (Artículo 3, PSAA16-10476)."}}]}

data: {"choices":[{"delta":{"content":"\n\n📄 **Fuente:** ACUERDO_NO._PSAA16-10476\n\n[📖 Ver ACUERDO_NO._PSAA16-10476](http://localhost:5000/siaa/ver/acuerdo_no._psaa16-10476.md)"}}]}

data: [DONE]

Example 2: Conversational Query

curl -X POST http://localhost:5000/siaa/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "¿Qué es SIAA?"
      }
    ]
  }'

Response:

data: {"choices":[{"delta":{"content":"SIAA significa Sistema Inteligente de Apoyo Administrativo. Soy el asistente oficial de la Seccional Bucaramanga de la Rama Judicial de Colombia. Puedo ayudarle con consultas sobre procesos judiciales, administrativos y normativos."}}]}

data: [DONE]

Example 3: Multi-turn Conversation

curl -X POST http://localhost:5000/siaa/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "¿Quién debe cargar la información en SIERJU?"
      },
      {
        "role": "assistant",
        "content": "Según el artículo 5 del Acuerdo PSAA16-10476, la responsabilidad de cargar la información recae en el funcionario o magistrado titular del despacho."
      },
      {
        "role": "user",
        "content": "¿Puede hacerlo un asistente?"
      }
    ]
  }'

Implementation Notes

Document Routing

The system uses a multi-level routing algorithm:

TF-IDF keywords (auto-generated + manual)
Density index (term frequency normalized by document)
Filename matching (pattern detection for PSAA, PCSJA, acuerdo, etc.)

Chunk Selection Strategy

Depending on query confidence, the system selects chunks dynamically:

Francotirador (ratio ≥3.0): 1 chunk (~800 chars) - high confidence
Binóculo (ratio ≥1.8): 2 chunks (~1600 chars) - medium confidence
Escopeta (ratio <1.8): 3 chunks (~2400 chars) - low confidence
Listado (enumeration queries): Minimum 2 chunks regardless of ratio

Context Window Management

num_ctx is dynamically adjusted based on context size:
- <400 tokens: num_ctx=1024
- 400-900 tokens: num_ctx=2048
- >900 tokens: num_ctx=3072

Quality Monitoring

All queries are logged to /opt/siaa/logs/calidad.jsonl with:

Timestamp, query type, question, response preview
Documents used, context size, response time
Automatic alert detection (POSIBLE_ALUCINACION, SIN_CONTEXTO)

Rate Limiting

The system enforces:

Max concurrent Ollama requests: 2 (configurable via MAX_OLLAMA_SIMULTANEOS)
Connection timeout: 8 seconds
Response timeout: 180 seconds

Requests exceeding the concurrency limit receive COLA_LLENA error.

Endpoints

Utility Endpoints

POST /siaa/chat

Overview

Endpoint

Request

Headers

Body Parameters

Example Request Body

Response

Response Headers

Stream Format

Example Response Stream

Query Types

Conversational Queries

Document Queries

Clarification Responses

Cache Behavior

Error Responses

Examples

Example 1: Document Query

Example 2: Conversational Query

Example 3: Multi-turn Conversation

Implementation Notes

Document Routing

Chunk Selection Strategy

Context Window Management

Quality Monitoring

Rate Limiting

Build docs developers (and LLMs) love

Endpoints

Utility Endpoints

​Overview

​Endpoint

​Request

​Headers

​Body Parameters

​Example Request Body

​Response

​Response Headers

​Stream Format

​Example Response Stream

​Query Types

​Conversational Queries

​Document Queries

​Clarification Responses

​Cache Behavior

​Error Responses

​Examples

​Example 1: Document Query

​Example 2: Conversational Query

​Example 3: Multi-turn Conversation

​Implementation Notes

​Document Routing

​Chunk Selection Strategy

​Context Window Management

​Quality Monitoring

​Rate Limiting

Build docs developers (and LLMs) love

Overview

Endpoint

Request

Headers

Body Parameters

Example Request Body

Response

Response Headers

Stream Format

Example Response Stream

Query Types

Conversational Queries

Document Queries

Clarification Responses

Cache Behavior

Error Responses

Examples

Example 1: Document Query

Example 2: Conversational Query

Example 3: Multi-turn Conversation

Implementation Notes

Document Routing

Chunk Selection Strategy

Context Window Management

Quality Monitoring

Rate Limiting