Skip to main content

Source Registry

BR-ACC ingests data from 45+ sources across federal, state, and international datasets. Each source has:
  • Tier (P0-P3): Priority for ingestion
  • Status: loaded, partial, stale, blocked_external, not_built
  • Frequency: Update cadence (daily, monthly, biennial, etc.)
  • Access Mode: file, api, bigquery, web

Status Legend

Loaded

Fully ingested and operational

Partial

Limited coverage or missing features

Stale

Needs freshness backfill

Not Built

Discovered but not implemented

P0 Sources (Critical)

Core identity and relationship data that powers the knowledge graph.

CNPJ

50M companies · Receita Federal · Monthly · ✅ Loaded

TSE

Elections & Donations · TSE · Biennial · ✅ Loaded

Transparência

Federal Contracts · Portal da Transparência · Monthly · ✅ Loaded

Sanctions

CEIS + CNEP · Portal da Transparência · Monthly · ✅ Loaded

PGFN

25M debt records · PGFN · Monthly · ✅ Loaded

TransfereGov

Federal Transfers · TransfereGov · Monthly · ✅ Loaded

DOU

Official Gazette · Imprensa Nacional · Daily · ✅ Loaded

Leniency Agreements

Anti-Corruption · CGU · Monthly · ✅ Loaded

CNPJ (Receita Federal)

Pipeline: cnpj | Tier: P0 | Status: ✅ Loaded
The CNPJ dataset is the foundation of BR-ACC’s knowledge graph. All other sources link to companies via CNPJ.
Coverage:
  • 50M+ companies (empresas)
  • 80M+ partners (socios): shareholders, administrators, legal representatives
  • 50M+ establishments (estabelecimentos): physical locations
Update Frequency: Monthly (released ~15th of each month) Access Mode: File download
  • Primary: Nextcloud share (arquivos.receitafederal.gov.br)
  • Fallback: Legacy dadosabertos.rfb.gov.br
File Format:
  • Headerless CSV (; delimiter, latin-1 encoding)
  • 10 files per type: Empresas0.zip through Empresas9.zip
  • ~100GB compressed, ~500GB extracted
Schema (see etl/src/bracc_etl/pipelines/cnpj.py:29):
Empresas: cnpj_basico, razao_social, natureza_juridica, capital_social, ...
Socios: cnpj_basico, identificador_socio, nome_socio, cpf_cnpj_socio, ...
Estabelecimentos: cnpj_basico, cnpj_ordem, cnpj_dv, situacao_cadastral, ...
Graph Nodes:
  • Company (CNPJ as key)
  • Person (CPF as key, from socios with valid CPF)
  • Partner (partial identities: masked/invalid CPFs)
Relationships:
  • Person -[:SOCIO_DE]-> Company
  • Partner -[:SOCIO_DE]-> Company
  • Company -[:SOCIO_DE]-> Company (corporate ownership)
CLI:
# Download CNPJ data
bracc-etl download --output-dir ./data/cnpj --files 10

# Run pipeline (streaming mode for large datasets)
bracc-etl run --source cnpj \
  --neo4j-password secret \
  --data-dir ./data \
  --streaming
Performance:
  • Streaming mode: 6 hours for full dataset
  • Memory: 2GB (fixed)
  • Output: 50M Company nodes, 60M Person nodes, 80M SOCIO_DE relationships

TSE (Tribunal Superior Eleitoral)

Pipeline: tse | Tier: P0 | Status: ✅ Loaded Coverage:
  • Elections: Candidates, results, party affiliations (1996-2024)
  • Donations: Campaign finance records (30M+ donations)
  • Candidate Assets: Declared patrimony (tse_bens)
  • Party Memberships: Filiação partidária (tse_filiados)
Update Frequency: Biennial (after elections) Access Mode: File download from dadosabertos.tse.jus.br Graph Nodes:
  • Person (candidates, donors)
  • Company (corporate donors)
  • Election, Party
Relationships:
  • Person -[:CANDIDATO_EM]-> Election
  • Person -[:DOOU_PARA]-> Person (candidate)
  • Company -[:DOOU_PARA]-> Person
  • Person -[:FILIADO_A]-> Party

Transparência (Portal da Transparência)

Pipeline: transparencia | Tier: P0 | Status: ✅ Loaded Coverage:
  • Contracts (compras): Federal government procurement
  • Servidores: Public servants registry + salaries
  • Emendas: Parliamentary amendments execution
Update Frequency: Monthly Access Mode: File download (portaldatransparencia.gov.br/download-de-dados) File Format: Monthly ZIP files, ;-delimited CSV (latin-1) CLI:
python etl/scripts/download_transparencia.py \
  --year 2025 \
  --datasets compras,servidores,emendas
Graph Nodes:
  • Company (contractors)
  • Person (public servants, amendment authors)
  • Contract, Amendment
Relationships:
  • Company -[:VENCEU_CONTRATO]-> Contract
  • Person -[:SERVIDOR_EM]-> Company (government agency)
  • Person -[:AUTOR_DE]-> Amendment

Sanctions (CEIS + CNEP)

Pipeline: sanctions | Tier: P0 | Status: ✅ Loaded Coverage:
  • CEIS: Administrative sanctions (Cadastro de Empresas Inidôneas e Suspensas)
  • CNEP: Punishment registry (Cadastro Nacional de Empresas Punidas)
Update Frequency: Monthly Access Mode: API query (portaldatransparencia.gov.br/sancoes/consulta) Graph Nodes:
  • Company, Person
  • Sanction
Relationships:
  • Company -[:SANCIONADA_EM]-> Sanction
  • Person -[:SANCIONADA_EM]-> Sanction

P1 Sources (High Priority)

Enrichment sources that add depth to entity profiles.

PEP CGU

Politically Exposed Persons · CGU · Monthly · ✅ Loaded

BNDES

Development Bank Loans · BNDES · Monthly · ✅ Loaded

IBAMA

Environmental Embargos · IBAMA · Monthly · ✅ Loaded

TCU

Audit Sanctions · TCU · Monthly · ✅ Loaded

ICIJ Offshore Leaks

Offshore Entities · ICIJ · Yearly · ✅ Loaded

OpenSanctions

Global PEPs · OpenSanctions · Monthly · ✅ Loaded

CVM

Market Proceedings · CVM · Monthly · ✅ Loaded

RAIS

Labor Statistics · ME · Annual · ✅ Loaded

P2 Sources (Medium Priority)

INEP

School Census · INEP · Annual · ✅ Loaded

DATASUS

Health Establishments · DATASUS · Monthly · ✅ Loaded

CPGF

Government Card Expenses · CGU · Monthly · ✅ Loaded

Viagens

Official Travel · CGU · Monthly · ✅ Loaded

International Sources

OFAC

US Sanctions · Treasury · Monthly · ✅ Loaded

EU Sanctions

EU Financial Sanctions · EU · Monthly · ✅ Loaded

UN Sanctions

UN Sanctions · UNSC · Monthly · ✅ Loaded

World Bank

Debarment List · World Bank · Monthly · ✅ Loaded

Partial/Stale Sources

Sources with known issues requiring attention.
These sources are ingested but have data quality or freshness issues.
SourceStatusIssueOwner
ComprasNet🟡 StaleNeeds freshness backfillAgent C
PNCP🟡 StaleFreshness SLA pendingAgent C
SICONFI🟡 PartialNo CNPJ direct linksAgent C
SIOP🟡 PartialAuthor linkage limitedAgent C
Câmara Inquiries🟡 PartialSessions still lowAgent E
Senado CPIs🟡 PartialNeeds richer sessionsAgent E
CAGED🟡 StaleAggregate-only implementationAgent H
Querido Diário🟡 PartialText availability gapAgent H
DataJud🔴 BlockedCredentials not operationalAgent D

Not Built (Discovered)

Sources identified but not yet implemented (61 total).

High-Value Targets (P1-P2)

STJ Dados Abertos

Superior court decisions · STJ · Monthly · P1

CNCIAI Improbidade

Misconduct convictions · CNJ · Monthly · P1

CVM Full Ownership Chain

Shareholder graph · CVM · Monthly · P1

Receita DIRBI

Tax benefit declarations · RFB · Monthly · P1

MapBiomas Alerta

Deforestation alerts · MapBiomas · Monthly · P1

SiCAR

Rural property registry · MAPA · Quarterly · P1

ANM Mining Rights

Mining permits · ANM · Monthly · P1

Tesouro Emendas

Budget execution · Tesouro · Monthly · P0 🔥

SIGA Brasil

Federal budget traces · Senado · Monthly · P0 🔥

Regulatory Agencies (P2-P3)

27 sources from agencies like ANEEL, ANATEL, ANTT, ANP, ANVISA, ANS covering concessions, licenses, and regulatory registrations.

State Audit Courts (P2-P3)

24 sources from TCE-SP, TCE-RJ, TCE-MG and state transparency portals.

Complete Registry

Full source registry with status, tier, and access mode:
Source IDNameCategoryTierStatusFrequencyAccess Mode
cnpjReceita Federal CNPJidentityP0✅ loadedmonthlyfile
tseTSE elections and donationselectoralP0✅ loadedbiennialfile
transparenciaPortal da Transparencia contractscontractsP0✅ loadedmonthlyfile
sanctionsCEIS CNEP sanctionssanctionsP0✅ loadedmonthlyfile
pep_cguCGU PEP listintegrityP1✅ loadedmonthlyfile
bndesBNDES financingsfinanceP1✅ loadedmonthlyfile
pgfnPGFN divida ativafiscalP0✅ loadedmonthlyfile
ibamaIBAMA embargosenvironmentP1✅ loadedmonthlyfile
comprasnetComprasNet contractscontractsP0🟡 stalemonthlyfile
tcuTCU sanctionsauditP1✅ loadedmonthlyfile
transferegovTransfereGov emendas e conveniostransfersP0✅ loadedmonthlyfile
raisRAIS aggregated laborlaborP1✅ loadedannualbigquery
inepINEP school censuseducationP2✅ loadedannualfile
douDiario Oficial da UniaogazetteP0✅ loadeddailybigquery
datasusDATASUS CNEShealthP1✅ loadedmonthlyfile
icijICIJ offshore leaksoffshoreP1✅ loadedyearlyfile
opensanctionsOpenSanctions global PEPsanctionsP1✅ loadedmonthlyfile
cvmCVM proceedingsmarketP1✅ loadedmonthlyfile
cvm_fundsCVM fund registrymarketP1✅ loadedmonthlyfile
camaraCamara CEAP expenseslegislativeP1✅ loadedmonthlyapi
camara_inquiriesCamara inquiries and requirementslegislativeP0🟡 partialdailyapi
senadoSenado CEAPS expenseslegislativeP1✅ loadedmonthlyapi
ceafCEAF expelled servantsintegrityP1✅ loadedmonthlyfile
cepimCEPIM barred NGOsintegrityP1✅ loadedmonthlyfile
cpgfCPGF gov card expensesspendingP2✅ loadedmonthlyfile
leniencyAcordos de lenienciaintegrityP0✅ loadedmonthlyfile
ofacOFAC sanctionssanctionsP1✅ loadedmonthlyfile
holdingsBrasil IO holdingsownershipP1✅ loadedmonthlyfile
viagensViagens a servicospendingP2✅ loadedmonthlyfile
siopSIOP emendasbudgetP0🟡 partialannualapi
pncpPNCP bids and contractscontractsP0🟡 stalemonthlyapi
renunciasRenuncias fiscaisfiscalP1✅ loadedannualfile
siconfiSICONFI municipal financefiscalP1🟡 partialannualapi
tse_bensTSE candidate assetselectoralP1✅ loadedbiennialfile
tse_filiadosTSE party membershipselectoralP1✅ loadedmonthlyfile
bcbBCB penaltiesfinanceP1✅ loadedmonthlyfile
stfSTF court datajudiciaryP1✅ loadedmonthlybigquery
cagedCAGED labor movementslaborP1🟡 stalemonthlyfile
eu_sanctionsEU sanctionssanctionsP1✅ loadedmonthlyfile
un_sanctionsUN sanctionssanctionsP1✅ loadedmonthlyfile
world_bankWorld Bank debarmentsanctionsP1✅ loadedmonthlyfile
senado_cpisSenado CPIslegislativeP0🟡 partialyearlyapi
midesMiDES municipal procurementmunicipalP0✅ loadeddailybigquery
querido_diarioQuerido Diario gazettesmunicipalP1🟡 partialdailyapi
datajudCNJ DataJudjudiciaryP0🔴 blockedmonthlyapi
… plus 64 not_built sources (P1-P3) covering regulatory agencies, state audit courts, and specialized datasets

Adding a New Source

To add a new source to the registry:
  1. Identify source: URL, update frequency, access mode
  2. Assign tier: P0 (critical), P1 (high), P2 (medium), P3 (low)
  3. Add to registry: Edit source_registry_br_v1.csv
  4. Create pipeline: See Creating Pipelines
  5. Register runner: Add to PIPELINES dict in etl/src/bracc_etl/runner.py:54

Next Steps

Running Pipelines

Run any of these 45+ pipelines locally

Creating Pipelines

Build a pipeline for a new data source

Pipeline Architecture

Learn about design patterns

Overview

Back to ETL framework overview

Build docs developers (and LLMs) love