Skip to main content

Overview

Mimir AIP consists of two core binaries and an optional web frontend, designed for scalable data processing, ML model training, and digital twin management across distributed Kubernetes environments.

System Architecture

┌──────────────────────────────────────────────────────┐
│                      Client Layer                    │
│   Web Frontend (port 3000)   │   MCP Client / Agent  │
└──────────────┬───────────────┴──────────┬────────────┘
               │  REST API                │  SSE (MCP)
               ▼                          ▼
┌─────────────────────────────────────────────────────┐
│                    Orchestrator                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────┐  │
│  │ Projects │  │Pipelines │  │   ML Models      │  │
│  │ Ontology │  │Schedules │  │   Digital Twins  │  │
│  │ Storage  │  │  Queue   │  │   MCP Server     │  │
│  └──────────┘  └────┬─────┘  └──────────────────┘  │
│          SQLite     │                               │
└─────────────────────┼───────────────────────────────┘
                      │  Kubernetes Jobs

           ┌─────────────────────┐
           │       Workers       │
           │  (pipeline, train,  │
           │   infer, DT sync)   │
           └──────────┬──────────┘

          ┌───────────▼───────────┐
          │   Storage Backends    │
          │  Filesystem · Postgres│
          │  MySQL · MongoDB · S3 │
          │  Redis · ES · Neo4j   │
          └───────────────────────┘

Components

Orchestrator

The orchestrator is a long-running HTTP server that serves as the central control plane for Mimir AIP.
  • Manages all persistent metadata in SQLite
    • Projects, pipelines, ontologies
    • ML models, digital twins
    • Storage configurations and schedules
  • Exposes REST API for all platform operations
  • Serves MCP tools via Server-Sent Events (SSE) endpoint
  • Dispatches worker jobs to Kubernetes clusters
  • Handles authentication and authorization
The orchestrator uses SQLite as its metadata store, persisting to a volume-mounted directory (/app/data by default).All application data flows through storage backends configured per-project, not through the orchestrator’s database.
While the orchestrator is a single-instance service, it supports:
  • Concurrent worker jobs (configurable pool size)
  • Multi-cluster job dispatch (primary + overflow clusters)
  • Horizontal scaling of workers across Kubernetes nodes
Configuration is managed via environment variables. See Configuration Reference.

Worker

Workers are short-lived Kubernetes Jobs spawned by the orchestrator to execute compute-intensive tasks.

Pipeline Execution

Runs ingestion, processing, and output pipeline steps sequentially, storing results as CIR objects.

ML Training

Trains decision trees, random forests, regression models, or neural networks from CIR data.

ML Inference

Executes trained models against new data for predictions or recommendations.

Digital Twin Sync

Synchronizes digital twin entities with storage backends, applying ontology constraints.
Workers are stateless and read all configuration from environment variables and orchestrator API calls. They report results back to the orchestrator upon completion.
Worker Lifecycle:
  1. Orchestrator enqueues a task (e.g., pipeline execution)
  2. Job scheduler spawns a Kubernetes Job with task parameters
  3. Worker pod starts, calls orchestrator API for full config
  4. Worker executes the task (pipeline, training, inference, sync)
  5. Worker reports results via orchestrator API
  6. Job completes, pod is removed by Kubernetes

Frontend

A lightweight React/TypeScript single-page application for managing Mimir resources.
  • Communicates exclusively with the orchestrator REST API
  • Served by a minimal Go HTTP server
  • Runs on port 3000 by default
  • Optional component (not required for MCP or programmatic use)
The frontend is ideal for visual exploration and configuration, but all operations are also available via REST API or MCP tools.

Data Flow

Ingestion Pipeline Flow

  1. Pipeline worker connects to data source (API, database, file)
  2. Raw data is converted to CIR format
  3. CIR is stored in configured storage backend
  4. Metadata is reported back to orchestrator
  5. Orchestrator updates job status and notifies watchers

ML Training Flow

  1. Orchestrator spawns training worker with model definition
  2. Worker retrieves training data as CIR from storage
  3. Ontology constraints are applied for feature engineering
  4. Model is trained with specified hyperparameters
  5. Model artifact is saved to persistent storage
  6. Training metrics reported to orchestrator

Digital Twin Sync Flow

  1. Sync worker fetches CIR data from storage
  2. Ontology is loaded to define entity types and relationships
  3. Entities are created or updated in digital twin
  4. Relationships are established based on ontology rules
  5. Digital twin state is persisted
  6. Sync status reported to orchestrator

Deployment Modes

Local development mode — runs orchestrator and frontend in containers.
docker compose up --build
Worker jobs are not available in Docker Compose mode. Use Kubernetes for full functionality.
Suitable for:
  • Local development and testing
  • API exploration
  • MCP integration testing

Network Communication

SourceDestinationProtocolPortPurpose
FrontendOrchestratorHTTP8080REST API calls
MCP ClientOrchestratorSSE8080MCP tool requests
WorkerOrchestratorHTTP8080Config fetch, result reporting
WorkerStorage BackendTCPvariesData read/write operations
OrchestratorKubernetes APIHTTPS6443Job creation, status monitoring
All orchestrator communication is over HTTP. For production, use an ingress controller with TLS termination.

Configuration

See Configuration Reference for complete environment variable documentation. Key configuration areas:
  • Orchestrator: Port, storage directory, log level
  • Workers: Min/max pool size, queue threshold, namespace
  • Kubernetes: Service account, cluster credentials
  • Storage: Backend plugins, connection strings

Observability

Logs

Structured JSON logs from orchestrator and workers. Configure log level via LOG_LEVEL environment variable.

Metrics

Worker job status and duration tracked in orchestrator database. Query via REST API or MCP tools.

Health Checks

/health endpoint on orchestrator. Returns 200 OK when service is ready.

Job Status

Real-time job status via /api/tasks endpoint. Monitor pipeline, training, and sync operations.

Next Steps

Terminology

Learn key terms and concepts used throughout Mimir AIP.

Data Model

Understand the core data structures and relationships.

CIR Format

Deep dive into the Common Internal Representation format.

Quick Start

Get Mimir AIP running in your environment.

Build docs developers (and LLMs) love