Skip to main content

Overview

The /arckit.data-model command creates a comprehensive data model with Entity-Relationship Diagram (ERD), GDPR compliance, and data governance framework.

When to Use

  • Phase: Phase 5.5 - Data Modeling
  • Timing: AFTER requirements (requires DR-xxx data requirements)
  • Purpose: Create data model to guide database design, API specs, and compliance

Command Usage

/arckit.data-model <project ID or domain>

Examples

# Create data model for existing project
/arckit.data-model 001

# Focus on specific domain
/arckit.data-model patient records

# New project
/arckit.data-model Create data model for payment gateway

What It Creates

A. Executive Summary

  • Total number of entities identified
  • Data classification summary (Public, Internal, Confidential, Restricted)
  • PII/sensitive data identified (Yes/No)
  • GDPR/DPA 2018 compliance status
  • Key data governance stakeholders

B. Visual Entity-Relationship Diagram (ERD)

Mermaid ERD syntax showing:
  • All entities (E-001, E-002, etc.)
  • Relationships (one-to-one, one-to-many, many-to-many)
  • Cardinality notation
  • Organized by logical domain/bounded context
Example:

C. Entity Catalog (E-001, E-002, etc.)

For each entity, comprehensive documentation: Example: E-001: Customer
#### E-001: Customer

**Description**: Represents a customer who can place orders and make payments.

**Source Requirement**: DR-001 (Customer Identity), DR-002 (Contact Information)

**Business Owner**: Marketing Director (from RACI matrix)  
**Technical Owner**: Customer Data Team  
**Data Classification**: Confidential (contains PII)

**Estimated Volume**: 
- Initial: 50,000 customers
- Growth Rate: +10,000/year
- 5-Year Projection: 100,000 customers

**Retention Period**: 
- Active customers: Indefinite (while account active)
- Inactive customers: 7 years after last transaction (GDPR Article 6)
- Deleted on request: 30 days (GDPR Article 17 - Right to Erasure)

**Attributes**:

| Attribute | Type | Required | PII | Description | Validation | Source Req |
|-----------|------|----------|-----|-------------|------------|------------|
| customer_id | UUID | Yes | No | Unique identifier | UUID v4 | DR-001 |
| email | String(255) | Yes | Yes | Contact email | RFC 5322, unique | DR-002 |
| name | String(100) | Yes | Yes | Full name | Non-empty | DR-002 |
| phone | String(20) | No | Yes | Phone number | E.164 format | DR-002 |
| created_at | DateTime | Yes | No | Account creation | ISO 8601 | DR-001 |
| updated_at | DateTime | Yes | No | Last modification | ISO 8601 | DR-001 |

**Relationships**:
- **1:N** with Transaction (E-002): One customer places many transactions
- **1:N** with PaymentMethod (E-003): One customer has many payment methods

**Indexes**:
- Primary Key: customer_id (clustered)
- Unique Index: email (unique constraint + fast lookup)
- Index: created_at (for reporting queries)

**Privacy Notes**:
- **PII Attributes**: email, name, phone (GDPR applies)
- **Lawful Basis**: Contract (Article 6(1)(b)) - necessary for service delivery
- **Data Subject Rights**: Access (SAR), Rectification, Erasure, Portability
- **Encryption**: AES-256 at rest, TLS 1.3 in transit
- **Access Logging**: All access to PII logged for 2 years

D. Data Governance Matrix

For each entity:
EntityData OwnerData StewardData CustodianSensitivityComplianceQuality SLA
E-001: CustomerMarketing DirectorCRM ManagerIT OperationsConfidentialGDPR, DPA 201899% accuracy
E-002: TransactionCFOFinance ManagerIT OperationsRestrictedPCI-DSS, GDPR99.9% accuracy
E-003: PaymentMethodCFOPayments TeamPayment ProcessorRestrictedPCI-DSS Level 1100% accuracy
Roles:
  • Data Owner: Business stakeholder accountable (from RACI matrix)
  • Data Steward: Person responsible for quality and compliance
  • Data Custodian: Technical team managing storage/backups

E. CRUD Matrix (Create, Read, Update, Delete)

Map which components can perform which operations:
EntityPayment APIAdmin PortalReporting ServiceCRM Integration
E-001: CustomerCR—CRUD-R—-R—
E-002: TransactionCR—-R—-R—----
E-003: PaymentMethodCRU-CRUD--------
Purpose: Identify unauthorized access patterns and data flows

F. Data Integration Mapping

Upstream Systems (Data Sources)

SystemEntity MappingUpdate FrequencyData Quality SLAIntegration Method
Salesforce CRME-001: CustomerReal-time99% accuracyREST API, OAuth 2.0
Legacy ERPE-004: ProductDaily batch95% completenessSFTP CSV export

Downstream Systems (Data Consumers)

SystemEntity MappingSync MethodLatency SLAIntegration Method
Analytics WarehouseAll entitiesEvent stream< 5 minutesKafka, Avro schema
Email ServiceE-001: Customer (email only)API call< 1 secondREST API, API key

Master Data Management

Source of Truth:
  • Customer data: Payment system is master (this system)
  • Product catalog: Legacy ERP is master (sync from ERP)
  • Transaction data: Payment system is master (this system)

G. Privacy & Compliance

GDPR/DPA 2018 Compliance

PII Inventory:
EntityPII AttributesLegal BasisRetention PeriodCross-Border Transfer
E-001: Customeremail, name, phoneContract (Art 6(1)(b))7 years after last transactionUK only (no transfer)
E-002: TransactionNone (pseudonymized)Contract (Art 6(1)(b))7 years (financial records)UK only
Data Subject Rights Implementation:
RightImplementationResponse Time
Access (SAR)Admin portal export function30 days (GDPR Art 15)
RectificationAdmin portal edit functionImmediate
ErasureSoft delete + 30-day purge job30 days (GDPR Art 17)
PortabilityJSON export via API30 days (GDPR Art 20)
Data Retention Schedules:
EntityActive RetentionArchivalDeletion Trigger
E-001: CustomerIndefinite (while active)7 years after last transactionData subject erasure request
E-002: Transaction2 years (hot storage)5 years (cold storage)7 years elapsed
Cross-Border Data Transfer:
  • UK → EU: Adequacy decision applies (no additional safeguards needed)
  • UK → US: NOT PERMITTED (no adequacy, SCCs not implemented)

Sector-Specific Compliance

PCI-DSS (Payment Card Industry Data Security Standard):
  • Requirement 3.4: Payment card data encrypted at rest (AES-256)
  • Requirement 3.5: Keys stored in HSM (Hardware Security Module)
  • Requirement 8.2: No storage of CVV/CVC (token-only approach)
Government Security Classifications (UK Gov only):
  • Customer data: OFFICIAL (baseline protective marking)
  • Transaction data: OFFICIAL-SENSITIVE (financial records)

Data Protection Impact Assessment (DPIA)

DPIA Required?: YES (high-risk processing of PII at scale) Key Privacy Risks:
  1. Risk R-DPIA-001: Unauthorized access to customer PII (MEDIUM)
  2. Risk R-DPIA-002: Data breach during payment processing (HIGH)
Mitigation Measures:
  • Encryption at rest and in transit
  • Role-based access control (RBAC)
  • Access logging and monitoring
  • Annual penetration testing
ICO Notification: Not required (mitigations reduce residual risk below ICO threshold)

H. Data Quality Framework

Quality Dimensions:
DimensionDefinitionTargetMeasurement Method
AccuracyData is correct> 99%Weekly data validation checks
CompletenessRequired fields populated> 95%Daily completeness reports
ConsistencySame data across systems> 98%Monthly reconciliation with CRM
TimelinessData is current< 5 min lagReal-time sync monitoring
UniquenessNo duplicates100%Weekly deduplication checks
ValidityConforms to format> 99%Validation rules enforced at write
Data Quality Metrics per Entity: E-001: Customer:
  • Email accuracy: > 99% (validated via double opt-in)
  • Name completeness: > 95% (required field)
  • Phone format validity: > 98% (E.164 validation)
Data Quality Monitoring:
  • Daily automated data quality checks
  • Weekly reports to Data Stewards
  • Monthly review with Data Owners
Issue Resolution Process:
  1. Automated detection (validation rules)
  2. Alert to Data Steward
  3. Investigation within 48 hours
  4. Correction within 5 business days
  5. Root cause analysis for recurring issues

I. Requirements Traceability

RequirementEntityAttributesRationale
DR-001E-001: Customercustomer_id, email, nameStore customer identity
DR-002E-002: Transactiontransaction_id, amount, statusTrack payments
NFR-SEC-003E-001: Customerpassword_hash (encrypted)Secure authentication
DR-005E-002: Transactioncreated_at, updated_at7-year retention (financial records)
Gap Analysis:
  • DR-001 through DR-008: All mapped to entities ✓
  • No unmapped data requirements ✓

J. Implementation Guidance

Database Technology Recommendation:
Use CaseRecommended TechnologyRationale
Transactional dataPostgreSQL 15+ACID compliance, strong relational support
Document storageMongoDB 6+Flexible schemas, horizontal scaling
Time-series dataTimescaleDBOptimized for metrics/events
Graph relationshipsNeo4jHighly connected data (social networks)
For this project: PostgreSQL 15+ (transactional payment data requires ACID compliance) Schema Migration Strategy:
  • Tool: Flyway (version-controlled SQL migrations)
  • Process: Test in dev → UAT → Prod
  • Rollback: Every migration has rollback script
Backup and Recovery:
  • RPO (Recovery Point Objective): 15 minutes (point-in-time recovery)
  • RTO (Recovery Time Objective): 1 hour (database restore time)
  • Backup Frequency: Continuous WAL archiving + daily full backups
  • Backup Retention: 30 days
Data Archival:
  • Move transactions > 2 years old to cold storage (S3 Glacier)
  • Archive process runs monthly
  • Archived data accessible via batch query (24-hour SLA)
Testing Data:
  • Anonymization: Hash email, replace name with “Test User”, randomize phone
  • Pseudonymization: Replace customer_id with synthetic IDs
  • Synthetic Data: Generate fake transactions for load testing
  • Tool: PostgreSQL Anonymizer extension

Output File

Creates: projects/{project}/ARC-{PROJECT_ID}-DATA-v1.0.md Contains:
  • Executive Summary
  • Visual ERD (Mermaid)
  • Entity Catalog (all entities with attributes, PII flags, retention)
  • GDPR Compliance Matrix
  • Data Governance Framework (ownership, CRUD matrix)
  • Data Quality Metrics
  • Data Retention Policy
  • Encryption and Security Requirements
  • Requirements Traceability Matrix
  • Implementation Guidance

Prerequisites

MANDATORY (command will warn if missing):
  • REQ (Requirements) - Must extract DR-xxx data requirements
If requirements don’t exist:
  • STOP and warn user: “Data model requires requirements document with DR-xxx data requirements. Please run /arckit.requirements first.”
RECOMMENDED (read if available):
  • STKE (Stakeholder Analysis) - Data owners from RACI matrix
  • PRIN (Architecture Principles) - Data governance standards

UK Government Specifics

Government Security Classifications:
  • OFFICIAL: Baseline (routine business)
  • SECRET: Very sensitive (national security)
  • TOP SECRET: Highest sensitivity
Data Standards: ICO Data Protection:
  • Reference ICO guidance for public sector
  • DPIA mandatory for high-risk processing
National Cyber Security Centre (NCSC):

Quality Checks

Before delivery, verifies:
  • Mermaid ERD syntax is valid
  • All entities trace to DR-xxx requirements
  • All PII attributes identified and flagged
  • GDPR lawful basis documented for all PII
  • Data retention periods specified
  • Data owners assigned from RACI matrix
  • CRUD matrix complete
  • Data quality metrics are measurable
  • No unmapped DR-xxx requirements (gap analysis)

Next Steps

After creating data model:
  1. Review with data protection officer (DPO)
  2. Validate with data owners and stakeholders
  3. Run /arckit.research to research database technologies
  4. Run /arckit.hld-review after HLD is created
  5. Run /arckit.dld-review to validate schema design

Requirements

MANDATORY prerequisite for DR-xxx data requirements

DPIA

Assess data protection impact for PII

HLD Review

Validate database technology choices

Traceability

Map DR-xxx to entities and HLD components

Example Outputs

Key References

Build docs developers (and LLMs) love