Architecture

Serenata de Amor consists of two independently deployable subsystems — Rosie and Jarbas — connected by a shared data artifact: suspicions.xz.

System overview

┌─────────────────────────────────────────────────────────────────┐
│  serenata-toolbox (pip)                                         │
│  Downloads reimbursement CSVs + companies.xz from open sources  │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│  Rosie  (Python / scikit-learn)                                 │
│  Adapter → Core → Classifiers → suspicions.xz                  │
└────────────────────────┬────────────────────────────────────────┘
                         │  suspicions.xz
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│  Jarbas  (Django / DRF / Elm)                                   │
│  manage.py suspicions → PostgreSQL → REST API + Dashboard       │
└─────────────────────────────────────────────────────────────────┘

Rosie subsystem

Rosie is a pure Python application with no persistent service. It is run on demand and exits after writing suspicions.xz.

Module structure

rosie/
├── rosie.py                        # CLI entry point (docopt)
└── rosie/
    ├── core/
    │   ├── __init__.py             # Core pipeline class
    │   └── classifiers/
    │       └── invalid_cnpj_cpf_classifier.py
    ├── chamber_of_deputies/
    │   ├── __init__.py             # main() wires Adapter + Core
    │   ├── adapter.py              # Downloads + prepares dataset
    │   ├── settings.py             # CLASSIFIERS dict + UNIQUE_IDS
    │   └── classifiers/
    │       ├── election_expenses_classifier.py
    │       ├── irregular_companies_classifier.py
    │       ├── meal_price_outlier_classifier.py
    │       ├── monthly_subquota_limit_classifier.py
    │       └── traveled_speeds_classifier.py
    └── federal_senate/
        └── __init__.py             # Equivalent structure for Senate

Core pipeline class

rosie/rosie/core/__init__.py implements the Core class, which is the generic analysis engine:

class Core:
    def __init__(self, settings, adapter):
        self.settings = settings
        self.dataset = adapter.dataset      # cleaned pandas DataFrame
        self.data_path = adapter.path

    def __call__(self):
        for name, classifier in self.settings.CLASSIFIERS.items():
            model = self.load_trained_model(classifier)
            self.predict(model, name)        # writes column into suspicions df

        output = os.path.join(self.data_path, 'suspicions.xz')
        self.suspicions.to_csv(output, compression='xz', encoding='utf-8', index=False)

Trained models are persisted as .pkl files via joblib so subsequent runs skip the fit step. MonthlySubquotaLimitClassifier is exempt from caching because its serialized size exceeds joblib’s practical limits.

Adapter

The Adapter (rosie/rosie/chamber_of_deputies/adapter.py) is responsible for:

Calling serenata-toolbox to download companies and per-year reimbursement CSVs
Merging the two datasets on cnpj_cpf / cnpj
Normalizing column names, document types, and date fields

from serenata_toolbox.chamber_of_deputies.reimbursements import Reimbursements
from serenata_toolbox.datasets import fetch

class Adapter:
    STARTING_YEAR = 2009

    def update_reimbursements(self, years=None):
        next_year = date.today().year + 1
        years = range(self.STARTING_YEAR, next_year)
        for year in years:
            Reimbursements(year, self.path)()

Classifiers

Each classifier lives in rosie/rosie/chamber_of_deputies/classifiers/ and inherits from scikit-learn’s TransformerMixin. The full set configured in settings.py:

Key in suspicions.xz	Class
`meal_price_outlier`	`MealPriceOutlierClassifier`
`over_monthly_subquota_limit`	`MonthlySubquotaLimitClassifier`
`suspicious_traveled_speed_day`	`TraveledSpeedsClassifier`
`invalid_cnpj_cpf`	`InvalidCnpjCpfClassifier`
`election_expenses`	`ElectionExpensesClassifier`
`irregular_companies_classifier`	`IrregularCompaniesClassifier`

InvalidCnpjCpfClassifier lives in rosie/rosie/core/classifiers/ because it is shared between the Chamber of Deputies and Federal Senate modules.

External dependency: serenata-toolbox

Rosie depends on serenata-toolbox (a separate pip package at github.com/okfn-brasil/serenata-toolbox) for all I/O against government data sources. Rosie itself contains no direct HTTP calls to government APIs.

Jarbas subsystem

Jarbas is a long-running web application. It serves a REST API and an Elm-compiled frontend dashboard, backed by PostgreSQL and Celery/RabbitMQ for background task processing.

Components

Django + DRF

The web layer. Django REST Framework handles all API serialization and filtering. Gunicorn serves the WSGI application. URL routing is defined in jarbas/urls.py.

Elm frontend

The dashboard UI is compiled from Elm source. Static assets are generated by Node.js (npm run assets) and collected via Django’s collectstatic.

Celery workers

Background task workers (tasks service) process long-running import jobs dispatched by management commands. A beat service handles scheduled tasks such as the search vector refresh.

PostgreSQL

The primary datastore. Jarbas requires PostgreSQL specifically because it uses Django’s JSONField and full-text search via SearchVector.

URL structure

Defined in jarbas/urls.py:

urlpatterns = [
    path('dashboard/', include('jarbas.dashboard.urls')),
    path('layers/', include('jarbas.layers.urls', namespace='layers')),
    path('api/', include('jarbas.core.urls', namespace='core')),
    path('api/chamber_of_deputies/',
         include('jarbas.chamber_of_deputies.urls',
                 namespace='chamber_of_deputies')),
    path('healthcheck/', healthcheck, name='healthcheck'),
]

The /api/chamber_of_deputies/ prefix hosts the reimbursement, subquota, applicant, and company endpoints documented in the API reference.

Data loading via management commands

Jarbas does not pull data directly from Rosie. Data is loaded manually using Django management commands:

python manage.py reimbursements <path/to/reimbursements.csv>
python manage.py companies      <path/to/companies.xz>
python manage.py suspicions     <path/to/suspicions.xz>
python manage.py searchvector
python manage.py tweets

These commands read the compressed CSV files produced by Rosie (and the toolbox) and write records into PostgreSQL via Django ORM.

Docker Compose topology

The docker-compose.yml defines seven services:

Application services
Infrastructure services
Analysis service

Service	Image	Role
`django`	`serenata/django`	Gunicorn WSGI server for the API and dashboard
`tasks`	`serenata/django`	Celery worker — processes import task queues
`beat`	`serenata/django`	Celery beat — runs scheduled jobs (e.g., search vector refresh)
`elm`	`serenata/elm`	Compiles and serves Elm frontend assets

Service	Image	Role
`queue`	`rabbitmq:3.7.3-alpine`	Message broker for Celery
`cache`	`memcached:1.5.8-alpine`	Application-level cache

Service	Image	Role
`rosie`	`serenata/rosie`	Runs Rosie analysis on demand; not a long-running service

Dependency graph

django
  └── depends_on: cache, elm, tasks
        tasks
          └── depends_on: queue (rabbitmq)
        beat
          └── depends_on: queue (rabbitmq)

rosie has no depends_on entries — it runs independently and shares a data volume (/tmp/serenata-data) with the host.

Running Rosie via Docker

docker run --rm \
  -v /tmp/serenata-data:/tmp/serenata-data \
  serenata/rosie \
  python rosie.py run chamber_of_deputies

The --rm flag removes the container after it exits. The volume mount makes suspicions.xz available on the host for subsequent import into Jarbas.

Data flow summary

serenata-toolbox
    │
    │  reimbursements-<year>.csv, companies.xz
    ▼
Adapter (rosie/chamber_of_deputies/adapter.py)
    │
    │  merged + cleaned pandas DataFrame
    ▼
Core (rosie/core/__init__.py)
    │
    │  runs 6 classifiers sequentially
    ▼
suspicions.xz   ◄── written to /tmp/serenata-data/
    │
    │  manual copy to Jarbas /mnt/data/
    ▼
manage.py suspicions
    │
    │  INSERT rows into PostgreSQL
    ▼
Django REST API + Elm dashboard
    │
    │  JSON responses to browser / API clients
    ▼
Twitter (@RosieDaSerenata)

PostgreSQL is the only mandatory database for Jarbas. The JSONField usage and SearchVector full-text search are PostgreSQL-specific features — other databases are not supported.

Overview

Getting Started

Rosie (AI Engine)

Jarbas (Web Platform)

Contributing

System overview

Rosie subsystem

Module structure

Core pipeline class

Adapter

Classifiers

External dependency: serenata-toolbox

Jarbas subsystem

Components

Django + DRF

Elm frontend

Celery workers

PostgreSQL

URL structure

Data loading via management commands

Docker Compose topology

Dependency graph

Running Rosie via Docker

Data flow summary

Build docs developers (and LLMs) love

Overview

Getting Started

Rosie (AI Engine)

Jarbas (Web Platform)

Contributing

​System overview

​Rosie subsystem

​Module structure

​Core pipeline class

​Adapter

​Classifiers

​External dependency: serenata-toolbox

​Jarbas subsystem

​Components

Django + DRF

Elm frontend

Celery workers

PostgreSQL

​URL structure

​Data loading via management commands

​Docker Compose topology

​Dependency graph

​Running Rosie via Docker

​Data flow summary

Build docs developers (and LLMs) love

System overview

Rosie subsystem

Module structure

Core pipeline class

Adapter

Classifiers

External dependency: serenata-toolbox

Jarbas subsystem

Components

URL structure

Data loading via management commands

Docker Compose topology

Dependency graph

Running Rosie via Docker

Data flow summary