Overview
Mimir AIP consists of two core binaries and an optional web frontend, designed for scalable data processing, ML model training, and digital twin management across distributed Kubernetes environments.System Architecture
Components
Orchestrator
The orchestrator is a long-running HTTP server that serves as the central control plane for Mimir AIP.Core Responsibilities
Core Responsibilities
- Manages all persistent metadata in SQLite
- Projects, pipelines, ontologies
- ML models, digital twins
- Storage configurations and schedules
- Exposes REST API for all platform operations
- Serves MCP tools via Server-Sent Events (SSE) endpoint
- Dispatches worker jobs to Kubernetes clusters
- Handles authentication and authorization
Storage
Storage
The orchestrator uses SQLite as its metadata store, persisting to a volume-mounted directory (
/app/data by default).All application data flows through storage backends configured per-project, not through the orchestrator’s database.Scalability
Scalability
While the orchestrator is a single-instance service, it supports:
- Concurrent worker jobs (configurable pool size)
- Multi-cluster job dispatch (primary + overflow clusters)
- Horizontal scaling of workers across Kubernetes nodes
Worker
Workers are short-lived Kubernetes Jobs spawned by the orchestrator to execute compute-intensive tasks.Pipeline Execution
Runs ingestion, processing, and output pipeline steps sequentially, storing results as CIR objects.
ML Training
Trains decision trees, random forests, regression models, or neural networks from CIR data.
ML Inference
Executes trained models against new data for predictions or recommendations.
Digital Twin Sync
Synchronizes digital twin entities with storage backends, applying ontology constraints.
Workers are stateless and read all configuration from environment variables and orchestrator API calls. They report results back to the orchestrator upon completion.
- Orchestrator enqueues a task (e.g., pipeline execution)
- Job scheduler spawns a Kubernetes Job with task parameters
- Worker pod starts, calls orchestrator API for full config
- Worker executes the task (pipeline, training, inference, sync)
- Worker reports results via orchestrator API
- Job completes, pod is removed by Kubernetes
Frontend
A lightweight React/TypeScript single-page application for managing Mimir resources.- Communicates exclusively with the orchestrator REST API
- Served by a minimal Go HTTP server
- Runs on port 3000 by default
- Optional component (not required for MCP or programmatic use)
Data Flow
Ingestion Pipeline Flow
- Pipeline worker connects to data source (API, database, file)
- Raw data is converted to CIR format
- CIR is stored in configured storage backend
- Metadata is reported back to orchestrator
- Orchestrator updates job status and notifies watchers
ML Training Flow
- Orchestrator spawns training worker with model definition
- Worker retrieves training data as CIR from storage
- Ontology constraints are applied for feature engineering
- Model is trained with specified hyperparameters
- Model artifact is saved to persistent storage
- Training metrics reported to orchestrator
Digital Twin Sync Flow
- Sync worker fetches CIR data from storage
- Ontology is loaded to define entity types and relationships
- Entities are created or updated in digital twin
- Relationships are established based on ontology rules
- Digital twin state is persisted
- Sync status reported to orchestrator
Deployment Modes
- Docker Compose
- Kubernetes (Single Cluster)
- Multi-Cluster
Local development mode — runs orchestrator and frontend in containers.Suitable for:
- Local development and testing
- API exploration
- MCP integration testing
Network Communication
| Source | Destination | Protocol | Port | Purpose |
|---|---|---|---|---|
| Frontend | Orchestrator | HTTP | 8080 | REST API calls |
| MCP Client | Orchestrator | SSE | 8080 | MCP tool requests |
| Worker | Orchestrator | HTTP | 8080 | Config fetch, result reporting |
| Worker | Storage Backend | TCP | varies | Data read/write operations |
| Orchestrator | Kubernetes API | HTTPS | 6443 | Job creation, status monitoring |
All orchestrator communication is over HTTP. For production, use an ingress controller with TLS termination.
Configuration
See Configuration Reference for complete environment variable documentation. Key configuration areas:- Orchestrator: Port, storage directory, log level
- Workers: Min/max pool size, queue threshold, namespace
- Kubernetes: Service account, cluster credentials
- Storage: Backend plugins, connection strings
Observability
Logs
Structured JSON logs from orchestrator and workers. Configure log level via
LOG_LEVEL environment variable.Metrics
Worker job status and duration tracked in orchestrator database. Query via REST API or MCP tools.
Health Checks
/health endpoint on orchestrator. Returns 200 OK when service is ready.Job Status
Real-time job status via
/api/tasks endpoint. Monitor pipeline, training, and sync operations.Next Steps
Terminology
Learn key terms and concepts used throughout Mimir AIP.
Data Model
Understand the core data structures and relationships.
CIR Format
Deep dive into the Common Internal Representation format.
Quick Start
Get Mimir AIP running in your environment.