Source Registry
BR-ACC ingests data from 45+ sources across federal, state, and international datasets. Each source has:- Tier (P0-P3): Priority for ingestion
- Status:
loaded,partial,stale,blocked_external,not_built - Frequency: Update cadence (daily, monthly, biennial, etc.)
- Access Mode:
file,api,bigquery,web
Status Legend
Loaded
Fully ingested and operational
Partial
Limited coverage or missing features
Stale
Needs freshness backfill
Not Built
Discovered but not implemented
P0 Sources (Critical)
Core identity and relationship data that powers the knowledge graph.CNPJ
50M companies · Receita Federal · Monthly · ✅ Loaded
TSE
Elections & Donations · TSE · Biennial · ✅ Loaded
Transparência
Federal Contracts · Portal da Transparência · Monthly · ✅ Loaded
Sanctions
CEIS + CNEP · Portal da Transparência · Monthly · ✅ Loaded
PGFN
25M debt records · PGFN · Monthly · ✅ Loaded
TransfereGov
Federal Transfers · TransfereGov · Monthly · ✅ Loaded
DOU
Official Gazette · Imprensa Nacional · Daily · ✅ Loaded
Leniency Agreements
Anti-Corruption · CGU · Monthly · ✅ Loaded
CNPJ (Receita Federal)
Pipeline:cnpj | Tier: P0 | Status: ✅ Loaded
The CNPJ dataset is the foundation of BR-ACC’s knowledge graph. All other sources link to companies via CNPJ.
- 50M+ companies (empresas)
- 80M+ partners (socios): shareholders, administrators, legal representatives
- 50M+ establishments (estabelecimentos): physical locations
- Primary: Nextcloud share (
arquivos.receitafederal.gov.br) - Fallback: Legacy
dadosabertos.rfb.gov.br
- Headerless CSV (
;delimiter,latin-1encoding) - 10 files per type:
Empresas0.zipthroughEmpresas9.zip - ~100GB compressed, ~500GB extracted
etl/src/bracc_etl/pipelines/cnpj.py:29):
Company(CNPJ as key)Person(CPF as key, from socios with valid CPF)Partner(partial identities: masked/invalid CPFs)
Person -[:SOCIO_DE]-> CompanyPartner -[:SOCIO_DE]-> CompanyCompany -[:SOCIO_DE]-> Company(corporate ownership)
- Streaming mode: 6 hours for full dataset
- Memory: 2GB (fixed)
- Output: 50M Company nodes, 60M Person nodes, 80M SOCIO_DE relationships
TSE (Tribunal Superior Eleitoral)
Pipeline:tse | Tier: P0 | Status: ✅ Loaded
Coverage:
- Elections: Candidates, results, party affiliations (1996-2024)
- Donations: Campaign finance records (30M+ donations)
- Candidate Assets: Declared patrimony (
tse_bens) - Party Memberships: Filiação partidária (
tse_filiados)
dadosabertos.tse.jus.br
Graph Nodes:
Person(candidates, donors)Company(corporate donors)Election,Party
Person -[:CANDIDATO_EM]-> ElectionPerson -[:DOOU_PARA]-> Person(candidate)Company -[:DOOU_PARA]-> PersonPerson -[:FILIADO_A]-> Party
Transparência (Portal da Transparência)
Pipeline:transparencia | Tier: P0 | Status: ✅ Loaded
Coverage:
- Contracts (
compras): Federal government procurement - Servidores: Public servants registry + salaries
- Emendas: Parliamentary amendments execution
portaldatransparencia.gov.br/download-de-dados)
File Format: Monthly ZIP files, ;-delimited CSV (latin-1)
CLI:
Company(contractors)Person(public servants, amendment authors)Contract,Amendment
Company -[:VENCEU_CONTRATO]-> ContractPerson -[:SERVIDOR_EM]-> Company(government agency)Person -[:AUTOR_DE]-> Amendment
Sanctions (CEIS + CNEP)
Pipeline:sanctions | Tier: P0 | Status: ✅ Loaded
Coverage:
- CEIS: Administrative sanctions (Cadastro de Empresas Inidôneas e Suspensas)
- CNEP: Punishment registry (Cadastro Nacional de Empresas Punidas)
portaldatransparencia.gov.br/sancoes/consulta)
Graph Nodes:
Company,PersonSanction
Company -[:SANCIONADA_EM]-> SanctionPerson -[:SANCIONADA_EM]-> Sanction
P1 Sources (High Priority)
Enrichment sources that add depth to entity profiles.PEP CGU
Politically Exposed Persons · CGU · Monthly · ✅ Loaded
BNDES
Development Bank Loans · BNDES · Monthly · ✅ Loaded
IBAMA
Environmental Embargos · IBAMA · Monthly · ✅ Loaded
TCU
Audit Sanctions · TCU · Monthly · ✅ Loaded
ICIJ Offshore Leaks
Offshore Entities · ICIJ · Yearly · ✅ Loaded
OpenSanctions
Global PEPs · OpenSanctions · Monthly · ✅ Loaded
CVM
Market Proceedings · CVM · Monthly · ✅ Loaded
RAIS
Labor Statistics · ME · Annual · ✅ Loaded
P2 Sources (Medium Priority)
INEP
School Census · INEP · Annual · ✅ Loaded
DATASUS
Health Establishments · DATASUS · Monthly · ✅ Loaded
CPGF
Government Card Expenses · CGU · Monthly · ✅ Loaded
Viagens
Official Travel · CGU · Monthly · ✅ Loaded
International Sources
OFAC
US Sanctions · Treasury · Monthly · ✅ Loaded
EU Sanctions
EU Financial Sanctions · EU · Monthly · ✅ Loaded
UN Sanctions
UN Sanctions · UNSC · Monthly · ✅ Loaded
World Bank
Debarment List · World Bank · Monthly · ✅ Loaded
Partial/Stale Sources
Sources with known issues requiring attention.| Source | Status | Issue | Owner |
|---|---|---|---|
| ComprasNet | 🟡 Stale | Needs freshness backfill | Agent C |
| PNCP | 🟡 Stale | Freshness SLA pending | Agent C |
| SICONFI | 🟡 Partial | No CNPJ direct links | Agent C |
| SIOP | 🟡 Partial | Author linkage limited | Agent C |
| Câmara Inquiries | 🟡 Partial | Sessions still low | Agent E |
| Senado CPIs | 🟡 Partial | Needs richer sessions | Agent E |
| CAGED | 🟡 Stale | Aggregate-only implementation | Agent H |
| Querido Diário | 🟡 Partial | Text availability gap | Agent H |
| DataJud | 🔴 Blocked | Credentials not operational | Agent D |
Not Built (Discovered)
Sources identified but not yet implemented (61 total).High-Value Targets (P1-P2)
STJ Dados Abertos
Superior court decisions · STJ · Monthly · P1
CNCIAI Improbidade
Misconduct convictions · CNJ · Monthly · P1
CVM Full Ownership Chain
Shareholder graph · CVM · Monthly · P1
Receita DIRBI
Tax benefit declarations · RFB · Monthly · P1
MapBiomas Alerta
Deforestation alerts · MapBiomas · Monthly · P1
SiCAR
Rural property registry · MAPA · Quarterly · P1
ANM Mining Rights
Mining permits · ANM · Monthly · P1
Tesouro Emendas
Budget execution · Tesouro · Monthly · P0 🔥
SIGA Brasil
Federal budget traces · Senado · Monthly · P0 🔥
Regulatory Agencies (P2-P3)
27 sources from agencies like ANEEL, ANATEL, ANTT, ANP, ANVISA, ANS covering concessions, licenses, and regulatory registrations.State Audit Courts (P2-P3)
24 sources from TCE-SP, TCE-RJ, TCE-MG and state transparency portals.Complete Registry
Full source registry with status, tier, and access mode:View Complete Source Registry (109 sources)
View Complete Source Registry (109 sources)
| Source ID | Name | Category | Tier | Status | Frequency | Access Mode |
|---|---|---|---|---|---|---|
cnpj | Receita Federal CNPJ | identity | P0 | ✅ loaded | monthly | file |
tse | TSE elections and donations | electoral | P0 | ✅ loaded | biennial | file |
transparencia | Portal da Transparencia contracts | contracts | P0 | ✅ loaded | monthly | file |
sanctions | CEIS CNEP sanctions | sanctions | P0 | ✅ loaded | monthly | file |
pep_cgu | CGU PEP list | integrity | P1 | ✅ loaded | monthly | file |
bndes | BNDES financings | finance | P1 | ✅ loaded | monthly | file |
pgfn | PGFN divida ativa | fiscal | P0 | ✅ loaded | monthly | file |
ibama | IBAMA embargos | environment | P1 | ✅ loaded | monthly | file |
comprasnet | ComprasNet contracts | contracts | P0 | 🟡 stale | monthly | file |
tcu | TCU sanctions | audit | P1 | ✅ loaded | monthly | file |
transferegov | TransfereGov emendas e convenios | transfers | P0 | ✅ loaded | monthly | file |
rais | RAIS aggregated labor | labor | P1 | ✅ loaded | annual | bigquery |
inep | INEP school census | education | P2 | ✅ loaded | annual | file |
dou | Diario Oficial da Uniao | gazette | P0 | ✅ loaded | daily | bigquery |
datasus | DATASUS CNES | health | P1 | ✅ loaded | monthly | file |
icij | ICIJ offshore leaks | offshore | P1 | ✅ loaded | yearly | file |
opensanctions | OpenSanctions global PEP | sanctions | P1 | ✅ loaded | monthly | file |
cvm | CVM proceedings | market | P1 | ✅ loaded | monthly | file |
cvm_funds | CVM fund registry | market | P1 | ✅ loaded | monthly | file |
camara | Camara CEAP expenses | legislative | P1 | ✅ loaded | monthly | api |
camara_inquiries | Camara inquiries and requirements | legislative | P0 | 🟡 partial | daily | api |
senado | Senado CEAPS expenses | legislative | P1 | ✅ loaded | monthly | api |
ceaf | CEAF expelled servants | integrity | P1 | ✅ loaded | monthly | file |
cepim | CEPIM barred NGOs | integrity | P1 | ✅ loaded | monthly | file |
cpgf | CPGF gov card expenses | spending | P2 | ✅ loaded | monthly | file |
leniency | Acordos de leniencia | integrity | P0 | ✅ loaded | monthly | file |
ofac | OFAC sanctions | sanctions | P1 | ✅ loaded | monthly | file |
holdings | Brasil IO holdings | ownership | P1 | ✅ loaded | monthly | file |
viagens | Viagens a servico | spending | P2 | ✅ loaded | monthly | file |
siop | SIOP emendas | budget | P0 | 🟡 partial | annual | api |
pncp | PNCP bids and contracts | contracts | P0 | 🟡 stale | monthly | api |
renuncias | Renuncias fiscais | fiscal | P1 | ✅ loaded | annual | file |
siconfi | SICONFI municipal finance | fiscal | P1 | 🟡 partial | annual | api |
tse_bens | TSE candidate assets | electoral | P1 | ✅ loaded | biennial | file |
tse_filiados | TSE party memberships | electoral | P1 | ✅ loaded | monthly | file |
bcb | BCB penalties | finance | P1 | ✅ loaded | monthly | file |
stf | STF court data | judiciary | P1 | ✅ loaded | monthly | bigquery |
caged | CAGED labor movements | labor | P1 | 🟡 stale | monthly | file |
eu_sanctions | EU sanctions | sanctions | P1 | ✅ loaded | monthly | file |
un_sanctions | UN sanctions | sanctions | P1 | ✅ loaded | monthly | file |
world_bank | World Bank debarment | sanctions | P1 | ✅ loaded | monthly | file |
senado_cpis | Senado CPIs | legislative | P0 | 🟡 partial | yearly | api |
mides | MiDES municipal procurement | municipal | P0 | ✅ loaded | daily | bigquery |
querido_diario | Querido Diario gazettes | municipal | P1 | 🟡 partial | daily | api |
datajud | CNJ DataJud | judiciary | P0 | 🔴 blocked | monthly | api |
Adding a New Source
To add a new source to the registry:- Identify source: URL, update frequency, access mode
- Assign tier: P0 (critical), P1 (high), P2 (medium), P3 (low)
- Add to registry: Edit
source_registry_br_v1.csv - Create pipeline: See Creating Pipelines
- Register runner: Add to
PIPELINESdict inetl/src/bracc_etl/runner.py:54
Next Steps
Running Pipelines
Run any of these 45+ pipelines locally
Creating Pipelines
Build a pipeline for a new data source
Pipeline Architecture
Learn about design patterns
Overview
Back to ETL framework overview