Skip to main content

Overview

The Common Internal Representation (CIR) is the normalized data format used throughout Mimir AIP. Every piece of data flowing through the platform — whether ingested from APIs, databases, files, or streams — is converted to CIR format before storage and processing. CIR provides:
  • Provenance tracking: Know where data came from and when
  • Format independence: Work with data regardless of original format
  • Schema inference: Automatically detect data structure
  • Quality metrics: Track data quality indicators
  • Consistency: Unified interface across all storage backends

Structure

A CIR object consists of three main blocks:

Source

Provenance information — where the data came from, when it was ingested, and in what format.

Data

The actual payload — can be JSON objects, arrays, CSV records, text, or binary data.

Metadata

Size, encoding, record count, inferred schema, and quality metrics.

Type Definition

pkg/models/cir.go:35
type CIR struct {
    Version  string      `json:"version"`  // CIR schema version (currently "1.0")
    Source   CIRSource   `json:"source"`   // Provenance block
    Data     interface{} `json:"data"`     // Payload block
    Metadata CIRMetadata `json:"metadata"` // Metadata block
}

Source Block

The Source block captures provenance information for data lineage and debugging.
pkg/models/cir.go:43
type CIRSource struct {
    Type       SourceType             `json:"type"`       // api, file, database, stream
    URI        string                 `json:"uri"`        // Source identifier
    Timestamp  time.Time              `json:"timestamp"`  // Ingestion timestamp
    Format     DataFormat             `json:"format"`     // csv, json, xml, text, binary
    Parameters map[string]interface{} `json:"parameters"` // Optional parameters
}

Source Types

Data fetched from REST APIs or HTTP endpoints.URI format: Full URL including query parameters
{
  "type": "api",
  "uri": "https://api.example.com/v1/customers?page=1",
  "timestamp": "2026-03-01T12:34:56Z",
  "format": "json",
  "parameters": {
    "method": "GET",
    "headers": {"Authorization": "Bearer ***"},
    "status_code": 200
  }
}
Data loaded from local or remote files.URI format: File path or file:// URL
{
  "type": "file",
  "uri": "/data/imports/customers_2026-03-01.csv",
  "timestamp": "2026-03-01T09:15:00Z",
  "format": "csv",
  "parameters": {
    "file_size": 1048576,
    "mime_type": "text/csv"
  }
}
Data queried from SQL or NoSQL databases.URI format: Connection string or database identifier
{
  "type": "database",
  "uri": "postgres://localhost:5432/production",
  "timestamp": "2026-03-01T14:22:30Z",
  "format": "json",
  "parameters": {
    "query": "SELECT * FROM customers WHERE active = true",
    "rows_affected": 1523
  }
}
Data ingested from real-time streams (Kafka, webhooks, etc.).URI format: Stream endpoint or topic name
{
  "type": "stream",
  "uri": "kafka://broker:9092/events",
  "timestamp": "2026-03-01T18:45:12Z",
  "format": "json",
  "parameters": {
    "partition": 3,
    "offset": 98234
  }
}

Data Formats

FormatDescriptionExample Use Cases
csvComma-separated valuesExport files, spreadsheets
jsonJSON objects or arraysAPI responses, config files
xmlXML documentsSOAP APIs, legacy systems
textPlain textLogs, notes, unstructured data
binaryBinary dataImages, PDFs, arbitrary files

Data Block

The Data block contains the actual payload in its native structure.
Single entity as a JSON object:
{
  "version": "1.0",
  "source": {
    "type": "api",
    "uri": "https://api.example.com/customer/123",
    "timestamp": "2026-03-01T12:00:00Z",
    "format": "json"
  },
  "data": {
    "id": "123",
    "name": "Acme Corp",
    "email": "[email protected]",
    "created_at": "2025-01-15"
  },
  "metadata": {
    "size": 256,
    "encoding": "utf-8",
    "record_count": 1
  }
}

Metadata Block

The Metadata block provides information about the data itself.
pkg/models/cir.go:52
type CIRMetadata struct {
    Size            int64                  `json:"size"`                       // Bytes
    Encoding        string                 `json:"encoding,omitempty"`         // e.g., "utf-8"
    RecordCount     int                    `json:"record_count,omitempty"`     // Number of records
    SchemaInference map[string]interface{} `json:"schema_inference,omitempty"` // Inferred structure
    QualityMetrics  map[string]interface{} `json:"quality_metrics,omitempty"`  // Quality indicators
}

Schema Inference

Schema Inference

Automatically detected structure of the data payload.
For CSV data:
{
  "schema_inference": {
    "columns": ["id", "name", "email", "age"],
    "types": ["string", "string", "string", "number"]
  }
}
For JSON objects:
{
  "schema_inference": {
    "fields": [
      {"name": "id", "type": "string", "nullable": false},
      {"name": "name", "type": "string", "nullable": false},
      {"name": "metadata", "type": "object", "nullable": true}
    ]
  }
}

Quality Metrics

Quality Metrics

Data quality indicators calculated during ingestion.
For structured data:
{
  "quality_metrics": {
    "completeness": 0.98,        // % of non-null values
    "duplicate_count": 3,         // Number of duplicate records
    "outlier_count": 5,           // Statistical outliers
    "validation_errors": 0        // Schema validation errors
  }
}
For text data:
{
  "quality_metrics": {
    "word_count": 1523,
    "sentence_count": 87,
    "character_count": 9842,
    "readability_score": 62.3
  }
}

Helper Functions

Mimir provides utility functions for working with CIR objects.

Creating CIR Objects

// pkg/storage/cir.go:64
func CreateCIRFromJSON(jsonData string, sourceURI string, sourceType models.SourceType) (*models.CIR, error) {
    var data interface{}
    if err := json.Unmarshal([]byte(jsonData), &data); err != nil {
        return nil, fmt.Errorf("failed to parse JSON: %w", err)
    }
    
    cir := models.NewCIR(sourceType, sourceURI, models.DataFormatJSON, data)
    cir.Metadata.Encoding = "utf-8"
    
    if arr, ok := data.([]interface{}); ok {
        cir.Metadata.RecordCount = len(arr)
    } else {
        cir.Metadata.RecordCount = 1
    }
    
    cir.UpdateSize()
    return cir, nil
}

Accessing CIR Data

// pkg/models/cir.go:122
func (c *CIR) GetDataAsMap() (map[string]interface{}, error) {
    if m, ok := c.Data.(map[string]interface{}); ok {
        return m, nil
    }
    
    // Try JSON marshal/unmarshal
    data, err := json.Marshal(c.Data)
    if err != nil {
        return nil, err
    }
    
    var m map[string]interface{}
    if err := json.Unmarshal(data, &m); err != nil {
        return nil, err
    }
    
    return m, nil
}

Validation

pkg/models/cir.go:78
func (c *CIR) Validate() error {
    if c.Version == "" {
        return fmt.Errorf("CIR version is required")
    }
    if c.Source.Type == "" {
        return fmt.Errorf("CIR source type is required")
    }
    if c.Source.URI == "" {
        return fmt.Errorf("CIR source URI is required")
    }
    if c.Source.Format == "" {
        return fmt.Errorf("CIR source format is required")
    }
    if c.Data == nil {
        return fmt.Errorf("CIR data cannot be nil")
    }
    return nil
}

Querying CIR Data

Storage plugins use CIRQuery to retrieve data from backends.
pkg/models/storage.go:91
type CIRQuery struct {
    EntityType string          `json:"entity_type"`
    Filters    []CIRCondition  `json:"filters"`
    OrderBy    []OrderByClause `json:"order_by"`
    Limit      int             `json:"limit"`
    Offset     int             `json:"offset"`
}

type CIRCondition struct {
    Attribute string      `json:"attribute"`
    Operator  string      `json:"operator"` // eq, neq, gt, gte, lt, lte, in, like
    Value     interface{} `json:"value"`
}
{
  "entity_type": "Customer",
  "filters": [
    {
      "attribute": "status",
      "operator": "eq",
      "value": "active"
    }
  ],
  "order_by": [
    {"attribute": "created_at", "direction": "desc"}
  ],
  "limit": 100
}

Storage Operations

All storage plugins implement the StoragePlugin interface:
pkg/models/storage.go:28
type StoragePlugin interface {
    Initialize(config *PluginConfig) error
    CreateSchema(ontology *OntologyDefinition) error
    Store(cir *CIR) (*StorageResult, error)
    Retrieve(query *CIRQuery) ([]*CIR, error)
    Update(query *CIRQuery, updates *CIRUpdate) (*StorageResult, error)
    Delete(query *CIRQuery) (*StorageResult, error)
    GetMetadata() (*StorageMetadata, error)
    HealthCheck() (bool, error)
}
Store a CIR object in the backend:
result, err := plugin.Store(cir)
if err != nil {
    return fmt.Errorf("storage failed: %w", err)
}
Query and retrieve CIR objects:
query := &models.CIRQuery{
    EntityType: "Customer",
    Filters: []models.CIRCondition{
        {Attribute: "status", Operator: "eq", Value: "active"},
    },
    Limit: 100,
}

cirs, err := plugin.Retrieve(query)
Update existing CIR data:
update := &models.CIRUpdate{
    Filters: []models.CIRCondition{
        {Attribute: "id", Operator: "eq", Value: "123"},
    },
    Updates: map[string]interface{}{
        "status": "inactive",
        "updated_at": time.Now(),
    },
}

result, err := plugin.Update(query, update)
Delete CIR data matching query:
query := &models.CIRQuery{
    Filters: []models.CIRCondition{
        {Attribute: "created_at", Operator: "lt", Value: "2025-01-01"},
    },
}

result, err := plugin.Delete(query)

Best Practices

Always Set Source URI

Include a descriptive, unique URI for data lineage tracking. Use full URLs for APIs, absolute paths for files.

Preserve Original Format

Store the original format in Source.Format even after conversion. This aids in debugging and re-processing.

Include Parameters

Store ingestion parameters (HTTP headers, query filters, pagination state) for reproducibility.

Calculate Quality Metrics

Add quality metrics during ingestion to enable data quality monitoring and alerting.

Update Size

Call cir.UpdateSize() after modifying data to keep metadata accurate.

Validate Before Storage

Call cir.Validate() before passing to storage plugins to catch errors early.

Complete Example

{
  "version": "1.0",
  "source": {
    "type": "api",
    "uri": "https://api.ecommerce.example/v1/orders?page=1&limit=50",
    "timestamp": "2026-03-01T14:30:00Z",
    "format": "json",
    "parameters": {
      "method": "GET",
      "headers": {
        "Authorization": "Bearer ***",
        "User-Agent": "MimirAIP/1.0"
      },
      "status_code": 200,
      "response_time_ms": 342
    }
  },
  "data": [
    {
      "order_id": "ORD-2026-001",
      "customer_id": "CUST-789",
      "total": 149.99,
      "status": "shipped",
      "created_at": "2026-02-28T10:15:00Z"
    },
    {
      "order_id": "ORD-2026-002",
      "customer_id": "CUST-456",
      "total": 79.50,
      "status": "processing",
      "created_at": "2026-03-01T09:22:00Z"
    }
  ],
  "metadata": {
    "size": 1024,
    "encoding": "utf-8",
    "record_count": 2,
    "schema_inference": {
      "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "customer_id", "type": "string"},
        {"name": "total", "type": "number"},
        {"name": "status", "type": "string"},
        {"name": "created_at", "type": "datetime"}
      ]
    },
    "quality_metrics": {
      "completeness": 1.0,
      "duplicate_count": 0,
      "validation_errors": 0
    }
  }
}

Next Steps

Storage Plugins

Learn how to implement custom storage plugins.

Pipeline Development

Create pipelines that produce and consume CIR data.

API Reference

Explore storage API endpoints.

Build docs developers (and LLMs) love