Core Concepts
Project
Definition
A Project is the top-level organizational unit in Mimir AIP. It groups all related resources for a specific use case or domain.
- Pipelines for data ingestion and processing
- Ontologies defining domain structure
- ML models for predictions and recommendations
- Digital twins for real-time simulation
- Storage configurations for data persistence
- E-commerce analytics platform
- IoT sensor monitoring system
- Supply chain optimization
Source code reference
Pipeline
Definition
A Pipeline is a named, ordered sequence of processing steps executed asynchronously by workers.
- Ingestion: Fetch data from external sources (APIs, databases, files)
- Processing: Transform, enrich, or analyze data
- Output: Write results to storage backends or external systems
Source code reference
- Plugin: The execution plugin (e.g.,
http,postgres,transform) - Action: The operation to perform (e.g.,
GET,query,filter) - Parameters: Configuration for the action
- Output: Where to store results for subsequent steps
Schedule
Definition
A Schedule is a cron-based trigger that enqueues one or more pipelines on a recurring basis.
- Automated data ingestion (e.g., daily API pulls)
- Periodic model retraining
- Regular digital twin synchronization
0 0 * * *— Daily at midnight*/15 * * * *— Every 15 minutes0 9 * * 1-5— Weekdays at 9 AM
Ontology
Definition
An Ontology is an OWL/Turtle vocabulary that defines entity types, properties, and relationships for a project domain.
- Structure storage schemas across backends
- Constrain ML model training features
- Define digital twin entity types and relationships
- Validate data quality and consistency
Source code reference
Ontologies can be created manually or auto-generated from existing data using the ontology extraction feature.
Storage Config
Definition
A Storage Config is a connection definition for a storage backend where CIR data is persisted.
- Filesystem: Local or network-mounted directories
- PostgreSQL: Relational data storage
- MySQL: Relational data storage
- MongoDB: Document-oriented storage
- S3: Object storage (AWS, MinIO, compatible)
- Redis: In-memory key-value store
- Elasticsearch: Search and analytics engine
- Neo4j: Graph database
Source code reference
CIR (Common Internal Representation)
Definition
CIR is the normalized record format used across all storage backends in Mimir AIP.
- Source block: Provenance information (type, URI, timestamp, format)
- Data block: The actual payload (JSON, CSV, text, binary)
- Metadata block: Size, encoding, quality metrics, schema inference
Source code reference
ML Model
Definition
An ML Model is a machine learning model definition linked to an ontology, trained and executed by workers.
- Decision Tree: Fast, interpretable classification
- Random Forest: Ensemble method for robust predictions
- Regression: Linear or polynomial regression
- Neural Network: Deep learning models
Source code reference
- Create: Define model type and link to ontology
- Train: Worker fetches CIR data and trains model
- Validate: Performance metrics calculated on test set
- Infer: Run predictions against new data
- Monitor: Track degradation over time
Digital Twin
Definition
A Digital Twin is a live in-memory graph of entities and relationships, initialized from an ontology and synchronized from storage.
- Real-time queries via SPARQL
- What-if scenario modeling
- ML-powered predictions on entities
- Automated actions based on conditions
Source code reference
Entity Management
Entity Management
Digital twins store entities (instances of ontology classes) with attributes and relationships.
SPARQL Queries
SPARQL Queries
Query digital twin data using standard SPARQL syntax:
What-If Scenarios
What-If Scenarios
Create hypothetical scenarios with modifications to test predictions:
Automated Actions
Automated Actions
Trigger pipelines when conditions are met:
MCP (Model Context Protocol)
Definition
MCP is an open standard for exposing tools to AI agents. Mimir exposes 55 MCP tools covering all platform resources.
- Natural language interaction with the platform
- Agent-driven pipeline creation and execution
- Automated model training and deployment
- Dynamic digital twin queries and scenarios
| Category | Count | Examples |
|---|---|---|
| Projects | 8 | Create, update, delete, clone |
| Pipelines | 6 | Create, execute, get status |
| Schedules | 5 | Create, update, list |
| ML Models | 7 | Train, infer, recommend type |
| Digital Twins | 7 | Sync, query, create scenario |
| Ontologies | 6 | Create, generate, extract |
| Storage | 8 | Store, retrieve, update, delete |
| Tasks | 3 | List, get, cancel |
| System | 1 | Health check |
Status Values
Project Status
active— Project is operationalarchived— Project is read-only, hidden from listingsdraft— Project is being configured
Pipeline Status
active— Pipeline can be executedinactive— Pipeline is disableddraft— Pipeline is being configured
Model Status
draft— Model created but not trainedtraining— Training job in progresstrained— Training completed successfullyfailed— Training faileddegraded— Performance below thresholddeprecated— Manually marked as obsoletearchived— Removed from active use
Ontology Status
draft— Ontology is being editedactive— Ontology is in use by models/twinsarchived— Ontology is no longer in use
Digital Twin Status
active— Digital twin is operationalsyncing— Synchronization job in progresserror— Sync failed or twin is inconsistent
Next Steps
Architecture
Understand how components interact in the system.
Data Model
Learn about core data structures and relationships.
CIR Format
Deep dive into the Common Internal Representation.