Architecture

Infrahub is a distributed system built on a microservices architecture. This page explains the components, their interactions, and deployment patterns.

Core components

Infrahub server

The Infrahub server is the main API server that handles:

GraphQL API for data queries and mutations
REST API for configuration and management
WebSocket connections for real-time updates
Schema management and validation
Authentication and authorization
Git repository integration

The server runs as a Python application using FastAPI and Gunicorn. In production deployments, multiple replicas provide high availability and load distribution. Default configuration:

Port: 8000
Workers: 4 (configurable via WEB_CONCURRENCY)
Protocol: HTTP/HTTPS

Task workers

Task workers execute background operations asynchronously:

Schema migrations and updates
Git repository synchronization
Data validation and transformations
Artifact generation
Long-running queries and operations

Workers communicate with the task manager via Prefect and scale horizontally to handle workload. Default configuration:

Worker type: infrahubasync
Replicas: 2
Polling interval: 2 seconds

Neo4j database

Neo4j is the graph database that stores:

Object data (nodes, relationships, attributes)
Schema definitions
Branch and version history
Temporal data across branches

Infrahub supports both standalone Neo4j Community and clustered Neo4j Enterprise deployments. Default configuration:

Protocol: bolt
Port: 7687
HTTP port: 7474
Database type: Neo4j 2025.10.1

Message queue (RabbitMQ)

RabbitMQ provides asynchronous messaging between components:

Event notifications
Task distribution
Inter-service communication
Webhook triggers

Default configuration:

Port: 5672
Management UI: 15692
Virtual host: /
Driver: RabbitMQ

Alternatively, NATS can be used as the message broker by setting INFRAHUB_BROKER_DRIVER=nats.

Cache (Redis)

Redis provides distributed caching and locking:

Query result caching
Session storage
Distributed locks for concurrency control
Temporary data storage

Default configuration:

Port: 6379
Database: 0
Driver: Redis

Alternatively, NATS can be used as the cache backend by setting INFRAHUB_CACHE_DRIVER=nats.

Task manager (Prefect)

Prefect manages workflow orchestration:

Task scheduling and execution
Flow run tracking
Worker pool management
Execution history and logs

Components:

Prefect server: API and UI (port 4200)
PostgreSQL database: Flow run state and logs
Background workers: Task execution and cleanup

Default configuration:

API port: 4200
Database: PostgreSQL 18
Worker type: infrahubasync

Object storage

Object storage persists artifacts and files:

Generated artifacts (configurations, scripts)
Uploaded files
Git repository clones
Export data

Storage drivers:

Local filesystem (default): /opt/infrahub/storage
S3-compatible storage: AWS S3, MinIO, etc.

Deployment architectures

Single-node deployment

Suitable for development, testing, and small production deployments:

┌─────────────────────────────────────────┐
│              Docker Host                │
│                                         │
│  ┌──────────────┐  ┌─────────────┐    │
│  │ Infrahub     │  │ Task        │    │
│  │ Server       │  │ Worker      │    │
│  └──────────────┘  └─────────────┘    │
│                                         │
│  ┌──────────────┐  ┌─────────────┐    │
│  │ Neo4j        │  │ RabbitMQ    │    │
│  └──────────────┘  └─────────────┘    │
│                                         │
│  ┌──────────────┐  ┌─────────────┐    │
│  │ Redis        │  │ Prefect     │    │
│  └──────────────┘  └─────────────┘    │
└─────────────────────────────────────────┘

Resource requirements:

CPU: 4 cores minimum
RAM: 8 GB minimum
Storage: 50 GB minimum

High availability deployment

Suitable for production environments requiring resilience:

┌────────────────────────────────────────────────────────┐
│                  Load Balancer                         │
│            (HAProxy, NGINX, Ingress)                   │
└────────────────────────────────────────────────────────┘
              │
              ├──────────┬──────────┬──────────┐
              │          │          │          │
         ┌────────┐ ┌────────┐ ┌────────┐    │
         │ API    │ │ API    │ │ API    │    │
         │ Pod 1  │ │ Pod 2  │ │ Pod 3  │    │
         └────────┘ └────────┘ └────────┘    │
              │          │          │          │
         ┌────────┐ ┌────────┐ ┌────────┐    │
         │ Worker │ │ Worker │ │ Worker │    │
         │ Pod 1  │ │ Pod 2  │ │ Pod 3  │    │
         └────────┘ └────────┘ └────────┘    │
              │          │          │          │
              └──────────┴──────────┴──────────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
    ┌─────────┐   ┌──────────┐   ┌──────────┐
    │ Neo4j   │   │ RabbitMQ │   │ Redis    │
    │ Cluster │   │ Cluster  │   │ Sentinel │
    │ (3 node)│   │ (3 node) │   │ (3 node) │
    └─────────┘   └──────────┘   └──────────┘
         │               │               │
    ┌─────────┐   ┌──────────┐   ┌──────────┐
    │ S3      │   │ Prefect  │   │ Postgres │
    │ Storage │   │ Cluster  │   │ Cluster  │
    └─────────┘   └──────────┘   └──────────┘

High availability features:

Multiple API server replicas with load balancing
Multiple task worker replicas
Neo4j cluster with 3+ nodes (Enterprise only)
RabbitMQ cluster with quorum queues
Redis Sentinel for cache failover
PostgreSQL replication for Prefect state
S3-compatible object storage for shared artifacts

Resource requirements:

Small (16 GB RAM): 4 API workers, 2 task workers
Medium (32 GB RAM): 4 API workers, 4 task workers
Large (64 GB RAM): 4 API workers, 8 task workers

Network architecture

Infrahub components communicate over the following ports:

Component	Port	Protocol	Purpose
Infrahub Server	8000	HTTP/HTTPS	API and UI
Neo4j	7687	Bolt	Database queries
Neo4j	7474	HTTP	Admin interface
RabbitMQ	5672	AMQP	Message queue
RabbitMQ	15692	HTTP	Management UI
Redis	6379	Redis	Cache
Prefect	4200	HTTP	Task manager API
PostgreSQL	5432	PostgreSQL	Prefect database

Firewall considerations:

External access: Port 8000 (Infrahub API)
Internal network: All component ports
Database ports should not be exposed publicly

Data flow

GraphQL query execution

Client sends GraphQL query to Infrahub server
Server authenticates and authorizes request
Server queries Redis cache for cached results
If not cached, server queries Neo4j database
Server processes and transforms results
Server caches results in Redis
Server returns response to client

Background task execution

Server creates task definition
Server sends task to Prefect via API
Prefect schedules task in worker pool
Task worker polls Prefect for available tasks
Worker executes task logic
Worker updates task state in Prefect
Worker sends results to RabbitMQ message queue
Server receives notification and processes results

Git repository synchronization

User triggers repository sync via API
Server creates background task
Task worker clones/pulls repository
Worker validates repository contents
Worker imports schemas, checks, and generators
Worker updates Neo4j database
Worker stores repository in object storage
Worker sends completion notification

Scaling considerations

Horizontal scaling

API servers:

Add replicas to handle increased API traffic
Configure load balancer with sticky sessions for WebSocket
Set INFRAHUB_STORAGE_DRIVER=s3 when running multiple replicas

Task workers:

Add replicas to handle increased background workload
Workers automatically register with Prefect pool
Set INFRAHUB_WORKFLOW_WORKER_POLLING_INTERVAL to balance load

Database:

Neo4j Community: Single node only
Neo4j Enterprise: Cluster with 3+ nodes for high availability
Add read replicas for read-heavy workloads

Vertical scaling

API servers:

Increase CPU for faster query processing
Increase RAM for larger in-memory caches
Increase WEB_CONCURRENCY for more Gunicorn workers

Task workers:

Increase CPU for faster task execution
Increase RAM for larger datasets
Increase INFRAHUB_BROKER_MAXIMUM_CONCURRENT_MESSAGES for parallelism

Database:

Increase RAM for larger page cache (see NEO4J_dbms_memory_pagecache_size)
Increase heap size for query execution (see NEO4J_dbms_memory_heap_max__size)
Add SSD storage for better I/O performance

Installation guide - Deploy Infrahub using Docker Compose or Kubernetes
Configuration reference - Complete list of environment variables
Backup and restore - Protect your data with backups
Monitoring - Monitor Infrahub health and performance

Get Started

Core Concepts

Schema & Data Modeling

Data Management

Version Control & Branching

Transformations & Artifacts

Integration & Automation

Deployment & Operations

Core components

Infrahub server

Task workers

Neo4j database

Message queue (RabbitMQ)

Cache (Redis)

Task manager (Prefect)

Object storage

Deployment architectures

Single-node deployment

High availability deployment

Network architecture

Data flow

GraphQL query execution

Background task execution

Git repository synchronization

Scaling considerations

Horizontal scaling

Vertical scaling

Build docs developers (and LLMs) love

Get Started

Core Concepts

Schema & Data Modeling

Data Management

Version Control & Branching

Transformations & Artifacts

Integration & Automation

Deployment & Operations

​Core components

​Infrahub server

​Task workers

​Neo4j database

​Message queue (RabbitMQ)

​Cache (Redis)

​Task manager (Prefect)

​Object storage

​Deployment architectures

​Single-node deployment

​High availability deployment

​Network architecture

​Data flow

​GraphQL query execution

​Background task execution

​Git repository synchronization

​Scaling considerations

​Horizontal scaling

​Vertical scaling

​Related resources

Build docs developers (and LLMs) love

Core components

Infrahub server

Task workers

Neo4j database

Message queue (RabbitMQ)

Cache (Redis)

Task manager (Prefect)

Object storage

Deployment architectures

Single-node deployment

High availability deployment

Network architecture

Data flow

GraphQL query execution

Background task execution

Git repository synchronization

Scaling considerations

Horizontal scaling

Vertical scaling

Related resources