Overview
SIAA (Sistema Inteligente de Apoyo Administrativo) is an AI-powered judicial document management system that uses Ollama with the Qwen2.5:3b model for intelligent document search and question answering.System Requirements
Python
Python 3.8 or higher
LibreOffice
LibreOffice headless for document conversion
Tesseract
Tesseract OCR for scanned PDF processing
Ollama
Ollama with Qwen2.5:3b model
Installation Steps
Install System Dependencies
Install LibreOffice headless and Tesseract OCR on Fedora/RHEL:For Debian/Ubuntu:
LibreOffice headless is required for converting
.doc and .pdf files to .docx format.Install Python Dependencies
Install required Python packages:
Core Dependencies
- Flask & flask-cors: Web server and API framework
- pandas: Excel file processing
- python-docx: Word document parsing
- pymupdf4llm: PDF to Markdown conversion
- pdf2image & pytesseract: OCR for scanned documents
- openpyxl & xlrd: Excel format support
Install and Configure Ollama
Install Ollama from ollama.ai:Pull the Qwen2.5:3b model:Verify the installation:You should see
SIAA is configured to use the qwen2.5:3b model running on
http://localhost:11434. This is defined in siaa_proxy.py:182.qwen2.5:3b in the output.Create Directory Structure
Create the required SIAA directories:Set appropriate permissions:
Directory Purpose
| Directory | Purpose |
|---|---|
/opt/siaa/fuentes | Converted Markdown documents for indexing |
/opt/siaa/fuentes/normativa | Legal documents and norms |
/opt/siaa/logs | System logs including quality monitoring |
/opt/siaa/instructivos | Source Word/Excel files |
/opt/siaa/pdfs_origen | PDF files for conversion |
Configure Environment Variables
Set server IP and port (optional):To make these permanent, add them to
These environment variables are read in
siaa_proxy.py:168-170. If not set, the system uses the IP from the HTTP Host header and defaults to port 5000.~/.bashrc or ~/.profile:Configuration Details
Proxy Server Settings
The Flask proxy server (siaa_proxy.py) has the following key configuration:
The cache system (LRU cache) stores up to 200 frequently-asked questions with a 1-hour TTL, providing responses in ~5ms vs 44s for uncached queries.
Document Converter Settings
The document converter (convertidor.py) uses these default paths:
PDF Converter Settings
The PDF converter (convertidor_pdf.py) configuration:
Verification
Verify your installation:Next Steps
Quickstart Guide
Learn how to start the system and make your first query
Document Conversion
Convert your institutional documents to Markdown
Troubleshooting
Ollama connection failed
Ollama connection failed
Check if Ollama is running:Start Ollama if needed:Or run manually:
LibreOffice conversion errors
LibreOffice conversion errors
Ensure LibreOffice headless is properly installed:Test conversion manually:
Permission denied errors
Permission denied errors
Check directory permissions:Fix permissions:
Python package conflicts
Python package conflicts
Use a virtual environment to isolate dependencies: