Overview
The/arckit.data-model command creates a comprehensive data model with Entity-Relationship Diagram (ERD), GDPR compliance, and data governance framework.
When to Use
- Phase: Phase 5.5 - Data Modeling
- Timing: AFTER requirements (requires DR-xxx data requirements)
- Purpose: Create data model to guide database design, API specs, and compliance
Command Usage
Examples
What It Creates
A. Executive Summary
- Total number of entities identified
- Data classification summary (Public, Internal, Confidential, Restricted)
- PII/sensitive data identified (Yes/No)
- GDPR/DPA 2018 compliance status
- Key data governance stakeholders
B. Visual Entity-Relationship Diagram (ERD)
Mermaid ERD syntax showing:- All entities (E-001, E-002, etc.)
- Relationships (one-to-one, one-to-many, many-to-many)
- Cardinality notation
- Organized by logical domain/bounded context
C. Entity Catalog (E-001, E-002, etc.)
For each entity, comprehensive documentation: Example: E-001: CustomerD. Data Governance Matrix
For each entity:| Entity | Data Owner | Data Steward | Data Custodian | Sensitivity | Compliance | Quality SLA |
|---|---|---|---|---|---|---|
| E-001: Customer | Marketing Director | CRM Manager | IT Operations | Confidential | GDPR, DPA 2018 | 99% accuracy |
| E-002: Transaction | CFO | Finance Manager | IT Operations | Restricted | PCI-DSS, GDPR | 99.9% accuracy |
| E-003: PaymentMethod | CFO | Payments Team | Payment Processor | Restricted | PCI-DSS Level 1 | 100% accuracy |
- Data Owner: Business stakeholder accountable (from RACI matrix)
- Data Steward: Person responsible for quality and compliance
- Data Custodian: Technical team managing storage/backups
E. CRUD Matrix (Create, Read, Update, Delete)
Map which components can perform which operations:| Entity | Payment API | Admin Portal | Reporting Service | CRM Integration |
|---|---|---|---|---|
| E-001: Customer | CR— | CRUD | -R— | -R— |
| E-002: Transaction | CR— | -R— | -R— | ---- |
| E-003: PaymentMethod | CRU- | CRUD | ---- | ---- |
F. Data Integration Mapping
Upstream Systems (Data Sources)
| System | Entity Mapping | Update Frequency | Data Quality SLA | Integration Method |
|---|---|---|---|---|
| Salesforce CRM | E-001: Customer | Real-time | 99% accuracy | REST API, OAuth 2.0 |
| Legacy ERP | E-004: Product | Daily batch | 95% completeness | SFTP CSV export |
Downstream Systems (Data Consumers)
| System | Entity Mapping | Sync Method | Latency SLA | Integration Method |
|---|---|---|---|---|
| Analytics Warehouse | All entities | Event stream | < 5 minutes | Kafka, Avro schema |
| Email Service | E-001: Customer (email only) | API call | < 1 second | REST API, API key |
Master Data Management
Source of Truth:- Customer data: Payment system is master (this system)
- Product catalog: Legacy ERP is master (sync from ERP)
- Transaction data: Payment system is master (this system)
G. Privacy & Compliance
GDPR/DPA 2018 Compliance
PII Inventory:| Entity | PII Attributes | Legal Basis | Retention Period | Cross-Border Transfer |
|---|---|---|---|---|
| E-001: Customer | email, name, phone | Contract (Art 6(1)(b)) | 7 years after last transaction | UK only (no transfer) |
| E-002: Transaction | None (pseudonymized) | Contract (Art 6(1)(b)) | 7 years (financial records) | UK only |
| Right | Implementation | Response Time |
|---|---|---|
| Access (SAR) | Admin portal export function | 30 days (GDPR Art 15) |
| Rectification | Admin portal edit function | Immediate |
| Erasure | Soft delete + 30-day purge job | 30 days (GDPR Art 17) |
| Portability | JSON export via API | 30 days (GDPR Art 20) |
| Entity | Active Retention | Archival | Deletion Trigger |
|---|---|---|---|
| E-001: Customer | Indefinite (while active) | 7 years after last transaction | Data subject erasure request |
| E-002: Transaction | 2 years (hot storage) | 5 years (cold storage) | 7 years elapsed |
- UK → EU: Adequacy decision applies (no additional safeguards needed)
- UK → US: NOT PERMITTED (no adequacy, SCCs not implemented)
Sector-Specific Compliance
PCI-DSS (Payment Card Industry Data Security Standard):- Requirement 3.4: Payment card data encrypted at rest (AES-256)
- Requirement 3.5: Keys stored in HSM (Hardware Security Module)
- Requirement 8.2: No storage of CVV/CVC (token-only approach)
- Customer data: OFFICIAL (baseline protective marking)
- Transaction data: OFFICIAL-SENSITIVE (financial records)
Data Protection Impact Assessment (DPIA)
DPIA Required?: YES (high-risk processing of PII at scale) Key Privacy Risks:- Risk R-DPIA-001: Unauthorized access to customer PII (MEDIUM)
- Risk R-DPIA-002: Data breach during payment processing (HIGH)
- Encryption at rest and in transit
- Role-based access control (RBAC)
- Access logging and monitoring
- Annual penetration testing
H. Data Quality Framework
Quality Dimensions:| Dimension | Definition | Target | Measurement Method |
|---|---|---|---|
| Accuracy | Data is correct | > 99% | Weekly data validation checks |
| Completeness | Required fields populated | > 95% | Daily completeness reports |
| Consistency | Same data across systems | > 98% | Monthly reconciliation with CRM |
| Timeliness | Data is current | < 5 min lag | Real-time sync monitoring |
| Uniqueness | No duplicates | 100% | Weekly deduplication checks |
| Validity | Conforms to format | > 99% | Validation rules enforced at write |
- Email accuracy: > 99% (validated via double opt-in)
- Name completeness: > 95% (required field)
- Phone format validity: > 98% (E.164 validation)
- Daily automated data quality checks
- Weekly reports to Data Stewards
- Monthly review with Data Owners
- Automated detection (validation rules)
- Alert to Data Steward
- Investigation within 48 hours
- Correction within 5 business days
- Root cause analysis for recurring issues
I. Requirements Traceability
| Requirement | Entity | Attributes | Rationale |
|---|---|---|---|
| DR-001 | E-001: Customer | customer_id, email, name | Store customer identity |
| DR-002 | E-002: Transaction | transaction_id, amount, status | Track payments |
| NFR-SEC-003 | E-001: Customer | password_hash (encrypted) | Secure authentication |
| DR-005 | E-002: Transaction | created_at, updated_at | 7-year retention (financial records) |
- DR-001 through DR-008: All mapped to entities ✓
- No unmapped data requirements ✓
J. Implementation Guidance
Database Technology Recommendation:| Use Case | Recommended Technology | Rationale |
|---|---|---|
| Transactional data | PostgreSQL 15+ | ACID compliance, strong relational support |
| Document storage | MongoDB 6+ | Flexible schemas, horizontal scaling |
| Time-series data | TimescaleDB | Optimized for metrics/events |
| Graph relationships | Neo4j | Highly connected data (social networks) |
- Tool: Flyway (version-controlled SQL migrations)
- Process: Test in dev → UAT → Prod
- Rollback: Every migration has rollback script
- RPO (Recovery Point Objective): 15 minutes (point-in-time recovery)
- RTO (Recovery Time Objective): 1 hour (database restore time)
- Backup Frequency: Continuous WAL archiving + daily full backups
- Backup Retention: 30 days
- Move transactions > 2 years old to cold storage (S3 Glacier)
- Archive process runs monthly
- Archived data accessible via batch query (24-hour SLA)
- Anonymization: Hash email, replace name with “Test User”, randomize phone
- Pseudonymization: Replace customer_id with synthetic IDs
- Synthetic Data: Generate fake transactions for load testing
- Tool: PostgreSQL Anonymizer extension
Output File
Creates:projects/{project}/ARC-{PROJECT_ID}-DATA-v1.0.md
Contains:
- Executive Summary
- Visual ERD (Mermaid)
- Entity Catalog (all entities with attributes, PII flags, retention)
- GDPR Compliance Matrix
- Data Governance Framework (ownership, CRUD matrix)
- Data Quality Metrics
- Data Retention Policy
- Encryption and Security Requirements
- Requirements Traceability Matrix
- Implementation Guidance
Prerequisites
MANDATORY (command will warn if missing):- REQ (Requirements) - Must extract DR-xxx data requirements
- STOP and warn user: “Data model requires requirements document with DR-xxx data requirements. Please run
/arckit.requirementsfirst.”
- STKE (Stakeholder Analysis) - Data owners from RACI matrix
- PRIN (Architecture Principles) - Data governance standards
UK Government Specifics
Government Security Classifications:- OFFICIAL: Baseline (routine business)
- SECRET: Very sensitive (national security)
- TOP SECRET: Highest sensitivity
- Use GDS Data Standards Catalogue where applicable
- Preference for open data formats (JSON, CSV, OData)
- Reference ICO guidance for public sector
- DPIA mandatory for high-risk processing
- Follow NCSC data security patterns
Quality Checks
Before delivery, verifies:- Mermaid ERD syntax is valid
- All entities trace to DR-xxx requirements
- All PII attributes identified and flagged
- GDPR lawful basis documented for all PII
- Data retention periods specified
- Data owners assigned from RACI matrix
- CRUD matrix complete
- Data quality metrics are measurable
- No unmapped DR-xxx requirements (gap analysis)
Next Steps
After creating data model:- Review with data protection officer (DPO)
- Validate with data owners and stakeholders
- Run
/arckit.researchto research database technologies - Run
/arckit.hld-reviewafter HLD is created - Run
/arckit.dld-reviewto validate schema design
Related Commands
Requirements
MANDATORY prerequisite for DR-xxx data requirements
DPIA
Assess data protection impact for PII
HLD Review
Validate database technology choices
Traceability
Map DR-xxx to entities and HLD components
Example Outputs
NHS Appointment Booking (v7)
NHS Appointment Booking (v7)
Cabinet Office GenAI (v9)
Cabinet Office GenAI (v9)
Patent Application System (v6)
Patent Application System (v6)
Key References
- UK GDPR - Data protection law
- DPA 2018 - Data Protection Act
- PCI-DSS 4.0 - Payment card data security
- National Data Strategy - UK Government data strategy
- Government Data Quality Framework - Data quality dimensions