Overview
Thedata-model command creates comprehensive data architecture documentation including Entity-Relationship Diagrams (ERDs), entity catalogs with PII identification, GDPR compliance matrices, data governance frameworks, and requirements traceability. It drives database design, API specifications, and data protection assessments.
Command Syntax
Prerequisites
Mandatory
- Requirements Document (REQ): Must contain Data Requirements (DR-xxx)
- Command:
arckit requirements <project> - The tool will STOP and warn if missing
- Why: Data model MUST be based on DR-xxx requirements to ensure traceability
- Command:
Recommended
- Stakeholder Analysis (STKE): Identifies data owners from RACI matrix
- Architecture Principles (PRIN): Provides data governance standards, privacy by design principles
Workflow
1. Generate Requirements First
2. Create Data Model
3. Review Output
The command creates:- File:
projects/001-payment-gateway-modernization/ARC-001-DATA-v1.0.md - Summary: Shows entity counts, PII identification, GDPR status, requirements coverage
4. Next Steps (Handoffs)
HLD Review
Validate database technology choices against data model
DLD Review
Validate schema design, indexes, query patterns
SOW
Include data migration and governance in vendor RFP
Traceability
Map DR-xxx → Entities → Attributes → HLD components
Key Features
Entity-Relationship Diagram (ERD)
Generated using Mermaid syntax for GitHub-renderable diagrams. Example:Entity Catalog
Detailed documentation for each entity (E-001, E-002, etc.):Example: E-001 Customer Entity
Example: E-001 Customer Entity
PII Identification
Automatically flags Personal Identifiable Information across all entities:Data Governance Matrix
Defines ownership, stewardship, and access control:| Entity | Data Owner | Data Steward | Data Custodian | Access Control | Sensitivity | Compliance |
|---|---|---|---|---|---|---|
| E-001: Customer | CFO | Customer Success Manager | IT Ops | Role-based (Customer, Admin, Finance) | Confidential | GDPR, PCI-DSS |
| E-002: Transaction | CFO | Finance Director | IT Ops | Role-based (Customer read-only, Finance full) | Restricted | PCI-DSS, FCA |
| E-003: PaymentMethod | CTO | Security Officer | IT Ops | Tokenized (PCI compliance) | Restricted | PCI-DSS Level 1 |
CRUD Matrix
Maps which components can Create, Read, Update, Delete each entity:| Entity | Payment API | Admin Portal | Reporting Service | CRM Integration |
|---|---|---|---|---|
| E-001: Customer | CR— | CRUD | -R— | -R— |
| E-002: Transaction | CR— | -R— | -R— | ---- |
| E-003: PaymentMethod | CRU- | -RUD | ---- | ---- |
| E-004: RefundRequest | CR— | CRUD | -R— | -R— |
Data Quality Framework
Defines measurable quality targets:| Dimension | Definition | Target | Measurement |
|---|---|---|---|
| Accuracy | Data is correct and error-free | >99% | Email validation pass rate |
| Completeness | Required fields are populated | 100% | Non-null check on required fields |
| Consistency | Same data across systems | >98% | Reconciliation with CRM (daily) |
| Timeliness | Data is up-to-date | <1 hour | Transaction to reporting latency |
| Uniqueness | No duplicate records | 100% | Deduplication on email (unique index) |
| Validity | Conforms to format/rules | >99% | Regex/enum validation pass rate |
Requirements Traceability
Every entity and attribute traces back to DR-xxx requirements:| Requirement | Entity | Attributes | Rationale |
|---|---|---|---|
| DR-001 | E-001: Customer | customer_id, email, name | Store customer identity for authentication |
| DR-002 | E-002: Transaction | transaction_id, amount, currency, status | Track payment transactions for reconciliation |
| DR-003 | E-003: PaymentMethod | payment_method_id, card_token, expiry | Securely store tokenized payment methods (PCI-DSS) |
| NFR-SEC-003 | E-001: Customer | password_hash (bcrypt) | Secure authentication (bcrypt cost 12) |
GDPR Compliance
The data model includes comprehensive GDPR/DPA 2018 compliance:Legal Basis for Processing
For each entity containing PII, document the legal basis:- Article 6(1)(a) - Consent: User explicitly consents (e.g., marketing emails)
- Article 6(1)(b) - Contract: Necessary for contract performance (e.g., payment processing)
- Article 6(1)(c) - Legal Obligation: Required by law (e.g., tax records retention)
- Article 6(1)(f) - Legitimate Interest: Necessary for legitimate interests (e.g., fraud detection)
Special Category Data (Article 9)
If processing health, biometric, ethnic, political, religious, or genetic data:- Document Article 9 conditions (explicit consent, employment, vital interests, etc.)
- Flag for Data Protection Impact Assessment (DPIA) requirement
Data Subject Rights Implementation
DPIA Requirement
When is a DPIA Required?Under UK GDPR Article 35, a DPIA is required if processing is likely to result in high risk to individuals, including:
- Systematic monitoring (e.g., CCTV, tracking)
- Large-scale processing of special category data
- Automated decision-making with legal/significant effects
- Processing children’s data
- Innovative technology (AI, biometrics, blockchain)
arckit dpia <project>Data Integration Mapping
Upstream Systems (Data Sources)
| System | Entity Mapping | Update Frequency | Data Quality SLA | Authentication |
|---|---|---|---|---|
| Salesforce CRM | Customer → Account | Real-time (webhook) | 99% accuracy | OAuth 2.0 |
| Stripe Payments | Transaction → Charge | Real-time (API polling, 1min) | 99.9% accuracy | API key (secret) |
| SendGrid | Customer → Contact | Batch (daily, 2am) | 95% accuracy | API key |
Downstream Systems (Data Consumers)
| System | Entity Mapping | Sync Method | Latency SLA | Data Format |
|---|---|---|---|---|
| Reporting Warehouse | All entities | Batch (hourly, CDC) | <1 hour | Parquet |
| Customer Portal | Customer, Transaction | API (real-time) | <2 seconds | JSON |
| CRM Analytics | Customer, Transaction | Batch (nightly) | <12 hours | CSV |
Master Data Management (MDM)
| Entity | Source of Truth | Rationale |
|---|---|---|
| Customer | Payment API | Customers created during checkout |
| Transaction | Payment API | Transactions originate in payment service |
| Product | Product Catalog Service | External product database (upstream) |
Database Technology Recommendations
Based on data model characteristics:Relational (PostgreSQL, MySQL)
Best for: Transactional data, strong consistency, ACID guaranteesUse when: Complex relationships, financial data, PCI-DSS complianceExample: Payment transactions, customer accounts
Document (MongoDB, DynamoDB)
Best for: Flexible schemas, rapid iteration, nested dataUse when: Product catalogs, user profiles, CMS contentExample: Customer preferences, product metadata
Graph (Neo4j, Amazon Neptune)
Best for: Highly connected data, social graphs, recommendationsUse when: Friend networks, fraud detection, knowledge graphsExample: Customer relationships, recommendation engines
Time-Series (InfluxDB, TimescaleDB)
Best for: Metrics, events, IoT data, logsUse when: High write throughput, time-based queries, retention policiesExample: Transaction metrics, API logs, sensor data
UK Government Data Compliance
For public sector projects:Government Security Classifications
- OFFICIAL: Routine business data (default)
- OFFICIAL-SENSITIVE: Personal data, policy development
- SECRET: Very sensitive information (requires accreditation)
- TOP SECRET: Highest sensitivity (rare)
Data Standards
- Use GDS Data Standards Catalogue where applicable
- Prefer open data formats (JSON, CSV, OData)
- Reference ICO Data Protection guidance for public sector
- Follow NCSC data security patterns
National Data Strategy Alignment
Document Structure
The generated document includes:Real-World Example
Payment Gateway - Data Model Summary
Payment Gateway - Data Model Summary
Project: Payment Gateway Modernization (Project 001)Entities: 8 entities modeled
- Core Entities: Customer, Transaction, PaymentMethod
- Supporting Entities: RefundRequest, AuditLog
- Lookup/Reference Data: Currency, PaymentStatus, TransactionType
- One-to-Many: 8 (e.g., Customer → Transactions)
- Many-to-Many: 2 (e.g., Transaction ↔ PaymentMethod via junction table)
- One-to-One: 2 (e.g., Transaction → AuditLog)
- PII Attributes: 12 (email, name, phone, address)
- Encrypted Attributes: 15 (PII + sensitive financial data)
- Indexed Attributes: 22 (primary keys, foreign keys, performance)
- PII Entities: Customer (name, email, phone), Transaction (billing address)
- Legal Basis: Contract (Article 6(1)(b)) - payment processing
- DPIA Required: YES (large-scale payment card processing, PCI-DSS Level 1)
- Retention Periods: 7 years (PCI-DSS), 6 years (UK tax law)
- Data Owners: CFO (financial data), CTO (technical data), DPO (PII)
- CRUD Matrix: 4 roles defined (Customer, Admin, Finance, Support)
- Access Controls: Role-based + attribute-based (customers see own data only)
- PCI-DSS Level 1 (payment card tokenization, no plaintext storage)
- GDPR/DPA 2018 (PII encryption, SAR, erasure, portability)
- FCA regulations (financial services, UK)
- Data Requirements Mapped: 8 DR-xxx requirements
- Unmapped Requirements: 0
- Run
/arckit dpia 001for Data Protection Impact Assessment - Run
/arckit research database-technologiesfor technology selection - Run
/arckit hld-reviewafter HLD is created
Tips & Best Practices
Retention Policies Must Be EnforcedDefine retention periods per entity and implement automated deletion:
- Marketing data: 2 years after last interaction
- Transaction records: 7 years (PCI-DSS, UK tax law)
- Audit logs: 1 year (security compliance)
- Test data: Anonymize after 30 days
Quality Checks
Before generating the document, ArcKit validates:Related Commands
Requirements
Prerequisite: Run before data-model to create DR-xxx
DPIA
Next step: Assess data protection impact if PII
DataScout
Discovery: Find external data sources to integrate
HLD Review
Downstream: Validate database technology choices
DLD Review
Downstream: Validate schema design and indexes
Traceability
Downstream: Map DR-xxx → Entity → HLD component