Architecture

Overview

Mimir AIP consists of two core binaries and an optional web frontend, designed for scalable data processing, ML model training, and digital twin management across distributed Kubernetes environments.

System Architecture

┌──────────────────────────────────────────────────────┐
│                      Client Layer                    │
│   Web Frontend (port 3000)   │   MCP Client / Agent  │
└──────────────┬───────────────┴──────────┬────────────┘
               │  REST API                │  SSE (MCP)
               ▼                          ▼
┌─────────────────────────────────────────────────────┐
│                    Orchestrator                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────┐  │
│  │ Projects │  │Pipelines │  │   ML Models      │  │
│  │ Ontology │  │Schedules │  │   Digital Twins  │  │
│  │ Storage  │  │  Queue   │  │   MCP Server     │  │
│  └──────────┘  └────┬─────┘  └──────────────────┘  │
│          SQLite     │                               │
└─────────────────────┼───────────────────────────────┘
                      │  Kubernetes Jobs
                      ▼
           ┌─────────────────────┐
           │       Workers       │
           │  (pipeline, train,  │
           │   infer, DT sync)   │
           └──────────┬──────────┘
                      │
          ┌───────────▼───────────┐
          │   Storage Backends    │
          │  Filesystem · Postgres│
          │  MySQL · MongoDB · S3 │
          │  Redis · ES · Neo4j   │
          └───────────────────────┘

Components

Orchestrator

The orchestrator is a long-running HTTP server that serves as the central control plane for Mimir AIP.

Core Responsibilities

Manages all persistent metadata in SQLite
- Projects, pipelines, ontologies
- ML models, digital twins
- Storage configurations and schedules
Exposes REST API for all platform operations
Serves MCP tools via Server-Sent Events (SSE) endpoint
Dispatches worker jobs to Kubernetes clusters
Handles authentication and authorization

Storage

The orchestrator uses SQLite as its metadata store, persisting to a volume-mounted directory (/app/data by default).All application data flows through storage backends configured per-project, not through the orchestrator’s database.

Scalability

While the orchestrator is a single-instance service, it supports:

Concurrent worker jobs (configurable pool size)
Multi-cluster job dispatch (primary + overflow clusters)
Horizontal scaling of workers across Kubernetes nodes

Configuration is managed via environment variables. See Configuration Reference.

Worker

Workers are short-lived Kubernetes Jobs spawned by the orchestrator to execute compute-intensive tasks.

Pipeline Execution

Runs ingestion, processing, and output pipeline steps sequentially, storing results as CIR objects.

ML Training

Trains decision trees, random forests, regression models, or neural networks from CIR data.

ML Inference

Executes trained models against new data for predictions or recommendations.

Digital Twin Sync

Synchronizes digital twin entities with storage backends, applying ontology constraints.

Workers are stateless and read all configuration from environment variables and orchestrator API calls. They report results back to the orchestrator upon completion.

Worker Lifecycle:

Orchestrator enqueues a task (e.g., pipeline execution)
Job scheduler spawns a Kubernetes Job with task parameters
Worker pod starts, calls orchestrator API for full config
Worker executes the task (pipeline, training, inference, sync)
Worker reports results via orchestrator API
Job completes, pod is removed by Kubernetes

Frontend

A lightweight React/TypeScript single-page application for managing Mimir resources.

Communicates exclusively with the orchestrator REST API
Served by a minimal Go HTTP server
Runs on port 3000 by default
Optional component (not required for MCP or programmatic use)

The frontend is ideal for visual exploration and configuration, but all operations are also available via REST API or MCP tools.

Data Flow

Ingestion Pipeline Flow

Pipeline worker connects to data source (API, database, file)
Raw data is converted to CIR format
CIR is stored in configured storage backend
Metadata is reported back to orchestrator
Orchestrator updates job status and notifies watchers

ML Training Flow

Orchestrator spawns training worker with model definition
Worker retrieves training data as CIR from storage
Ontology constraints are applied for feature engineering
Model is trained with specified hyperparameters
Model artifact is saved to persistent storage
Training metrics reported to orchestrator

Digital Twin Sync Flow

Sync worker fetches CIR data from storage
Ontology is loaded to define entity types and relationships
Entities are created or updated in digital twin
Relationships are established based on ontology rules
Digital twin state is persisted
Sync status reported to orchestrator

Deployment Modes

Docker Compose
Kubernetes (Single Cluster)
Multi-Cluster

Local development mode — runs orchestrator and frontend in containers.

docker compose up --build

Worker jobs are not available in Docker Compose mode. Use Kubernetes for full functionality.

Suitable for:

Local development and testing
API exploration
MCP integration testing

Standard production deployment — orchestrator + workers on a single Kubernetes cluster.

helm install mimir-aip ./helm/mimir-aip \
  --namespace mimir-aip \
  --create-namespace

Suitable for:

Production deployments
Multi-tenant environments
Scalable workloads

Advanced deployment — orchestrator on primary cluster, workers distributed across multiple clusters.Configure overflow clusters in orchestrator environment:

CLUSTER_POOL_PRIMARY: primary-cluster
CLUSTER_POOL_OVERFLOW: cloud-cluster-1,cloud-cluster-2

Suitable for:

High-volume workloads
Hybrid cloud deployments
Geographically distributed processing

Network Communication

Source	Destination	Protocol	Port	Purpose
Frontend	Orchestrator	HTTP	8080	REST API calls
MCP Client	Orchestrator	SSE	8080	MCP tool requests
Worker	Orchestrator	HTTP	8080	Config fetch, result reporting
Worker	Storage Backend	TCP	varies	Data read/write operations
Orchestrator	Kubernetes API	HTTPS	6443	Job creation, status monitoring

All orchestrator communication is over HTTP. For production, use an ingress controller with TLS termination.

Configuration

See Configuration Reference for complete environment variable documentation. Key configuration areas:

Orchestrator: Port, storage directory, log level
Workers: Min/max pool size, queue threshold, namespace
Kubernetes: Service account, cluster credentials
Storage: Backend plugins, connection strings

Observability

Logs

Structured JSON logs from orchestrator and workers. Configure log level via LOG_LEVEL environment variable.

Metrics

Worker job status and duration tracked in orchestrator database. Query via REST API or MCP tools.

Health Checks

/health endpoint on orchestrator. Returns 200 OK when service is ready.

Job Status

Real-time job status via /api/tasks endpoint. Monitor pipeline, training, and sync operations.

Next Steps

Terminology

Learn key terms and concepts used throughout Mimir AIP.

Data Model

Understand the core data structures and relationships.

CIR Format

Deep dive into the Common Internal Representation format.

Quick Start

Get Mimir AIP running in your environment.

Getting Started

Core Concepts

Deployment

Platform Features

MCP Integration

Advanced Topics

Overview

System Architecture

Components

Orchestrator

Worker

Pipeline Execution

ML Training

ML Inference

Digital Twin Sync

Frontend

Data Flow

Ingestion Pipeline Flow

ML Training Flow

Digital Twin Sync Flow

Deployment Modes

Network Communication

Configuration

Observability

Logs

Metrics

Health Checks

Job Status

Next Steps

Terminology

Data Model

CIR Format

Quick Start

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Deployment

Platform Features

MCP Integration

Advanced Topics

​Overview

​System Architecture

​Components

​Orchestrator

​Worker

Pipeline Execution

ML Training

ML Inference

Digital Twin Sync

​Frontend

​Data Flow

​Ingestion Pipeline Flow

​ML Training Flow

​Digital Twin Sync Flow

​Deployment Modes

​Network Communication

​Configuration

​Observability

Logs

Metrics

Health Checks

Job Status

​Next Steps

Terminology

Data Model

CIR Format

Quick Start

Build docs developers (and LLMs) love

Overview

System Architecture

Components

Orchestrator

Worker

Frontend

Data Flow

Ingestion Pipeline Flow

ML Training Flow

Digital Twin Sync Flow

Deployment Modes

Network Communication

Configuration

Observability

Next Steps