Skip to main content

Prerequisites

Before you begin, ensure you have the following installed:
  • Docker (20.10+)
  • Docker Compose (2.0+)
  • Git
  • 8GB+ RAM recommended (minimum 4GB)
  • 10GB+ disk space for Docker images and initial data
For production deployments with 40M+ nodes, you’ll need significantly more resources (32GB+ RAM recommended, 64GB ideal).

Quick Start (3 Commands)

Get BR-ACC running with sample data in under 5 minutes:
1

Clone the repository

git clone https://github.com/World-Open-Graph/br-acc.git
cd br-acc
2

Configure environment

cp .env.example .env
The default values in .env.example are suitable for local development. For production, you should:
  • Set a strong NEO4J_PASSWORD (not “changeme”)
  • Generate a secure JWT_SECRET_KEY (32+ characters)
  • Adjust memory settings based on your hardware
NEO4J_HEAP_INITIAL=4G
NEO4J_HEAP_MAX=8G
NEO4J_PAGECACHE=12G
3

Start the stack and load seed data

docker compose up -d --build
bash infra/scripts/seed-dev.sh
This will:
  • Build and start Neo4j, API, and Frontend containers
  • Wait for services to become healthy
  • Load deterministic development seed data into Neo4j

Verify Installation

Once the services are running, verify that everything is working:
curl http://localhost:8000/health
# Expected: {"status":"ok"}

API Documentation

Interactive Swagger UI at http://localhost:8000/docs

Frontend

React application at http://localhost:3000

Neo4j Browser

Graph database UI at http://localhost:7474

API Health

Health endpoint at http://localhost:8000/health

Your First Query

Let’s explore the API with some example requests:
curl http://localhost:8000/api/v1/public/meta | jq
Returns statistics about the graph including:
  • Total nodes and relationships
  • Company, contract, and sanction counts
  • Source health status
  • Data quality metrics
Example response:
{
  "product": "World Transparency Graph",
  "mode": "public_safe",
  "total_nodes": 1523,
  "total_relationships": 2847,
  "company_count": 856,
  "contract_count": 234,
  "sanction_count": 45,
  "source_health": {
    "implemented_sources": 45,
    "loaded_sources": 12,
    "healthy_sources": 10
  }
}

Loading Real Data

The seed data is minimal and synthetic. To load real Brazilian public data:
Full data ingestion is resource-intensive and can take hours or days depending on your connection and hardware. Start with individual sources to test.
# Start the ETL service
docker compose --profile etl up -d

# Run a single pipeline (example: CNPJ company registry)
docker compose exec etl python -m bracc_etl.runner --source cnpj

Option 2: Full Ingestion (Heavy)

# Load all 45+ implemented sources
make bootstrap-all
This command:
  • Prompts for confirmation before resetting the database
  • Runs all implemented ETL pipelines in order
  • Continues on errors and generates detailed reports
  • Can take several hours to complete
  • Writes audit reports to audit-results/bootstrap-all/
make bootstrap-all-report
Shows per-source status:
  • loaded - Successfully ingested
  • blocked_external - Source unavailable
  • blocked_credentials - Requires API keys
  • failed_download - Download error
  • failed_pipeline - Processing error
  • skipped - Intentionally skipped

Managing Services

# Start all core services (Neo4j + API + Frontend)
docker compose up -d

# Start with ETL service
docker compose --profile etl up -d

# Rebuild and start
docker compose up -d --build

Environment Configuration

Key environment variables in .env:
NEO4J_PASSWORD=changeme                  # Change in production!
NEO4J_URI=bolt://localhost:7687
NEO4J_DATABASE=neo4j

# Memory settings (adjust based on your hardware)
NEO4J_HEAP_INITIAL=512m
NEO4J_HEAP_MAX=1G
NEO4J_PAGECACHE=512m
API_HOST=0.0.0.0
API_PORT=8000
LOG_LEVEL=info
APP_ENV=dev

# Generate with: openssl rand -hex 32
JWT_SECRET_KEY=change-me-generate-with-openssl-rand-hex-32

CORS_ORIGINS=http://localhost:3000
# Product tier (community or enterprise)
PRODUCT_TIER=community

# Public mode (restricts sensitive data access)
PUBLIC_MODE=false
PUBLIC_ALLOW_PERSON=false
PUBLIC_ALLOW_ENTITY_LOOKUP=false
PUBLIC_ALLOW_INVESTIGATIONS=false

# Pattern detection (experimental)
PATTERNS_ENABLED=false
# For BigQuery-based sources (TSE, Base dos Dados)
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json

# Optional API tokens
WORLD_BANK_API_KEY=
EU_SANCTIONS_TOKEN=

Troubleshooting

Symptom: API returns “neo4j: error” or connection refusedSolutions:
  1. Check Neo4j is running: docker compose ps neo4j
  2. View Neo4j logs: docker compose logs neo4j
  3. Wait for health check: Neo4j takes 10-30 seconds to start
  4. Verify password in .env matches NEO4J_PASSWORD
  5. Check port 7687 is not in use: lsof -i :7687
Symptom: Neo4j crashes or becomes unresponsiveSolutions:
  1. Increase heap and pagecache in .env:
    NEO4J_HEAP_INITIAL=2G
    NEO4J_HEAP_MAX=4G
    NEO4J_PAGECACHE=2G
    
  2. Restart services: docker compose restart neo4j
  3. Consider loading data in smaller batches
Symptom: “port is already allocated” errorSolutions:
  1. Check what’s using the port: lsof -i :7474 (or :8000, :3000)
  2. Stop conflicting service
  3. Or change ports in docker-compose.yml
Symptom: seed-dev.sh returns errorsSolutions:
  1. Ensure Neo4j is fully started: wait 30 seconds after docker compose up
  2. Check NEO4J_PASSWORD is set: echo $NEO4J_PASSWORD
  3. Export password: export NEO4J_PASSWORD=changeme
  4. Run script again: bash infra/scripts/seed-dev.sh

Next Steps

Architecture Overview

Learn how BR-ACC works under the hood

API Reference

Explore all available API endpoints

ETL Pipelines

Deep dive into data pipelines

Contributing

Contribute to the project
Need help? Join our Discord community or open an issue on GitHub.

Build docs developers (and LLMs) love