Skip to main content

Overview

The data-model command creates comprehensive data architecture documentation including Entity-Relationship Diagrams (ERDs), entity catalogs with PII identification, GDPR compliance matrices, data governance frameworks, and requirements traceability. It drives database design, API specifications, and data protection assessments.

Command Syntax

arckit data-model <project-id-or-domain>
Example:
arckit data-model 001
arckit data-model "payment gateway modernization"
arckit data-model "patient records"

Prerequisites

Mandatory

  • Requirements Document (REQ): Must contain Data Requirements (DR-xxx)
    • Command: arckit requirements <project>
    • The tool will STOP and warn if missing
    • Why: Data model MUST be based on DR-xxx requirements to ensure traceability
  • Stakeholder Analysis (STKE): Identifies data owners from RACI matrix
  • Architecture Principles (PRIN): Provides data governance standards, privacy by design principles

Workflow

1. Generate Requirements First

arckit requirements "payment gateway modernization"

2. Create Data Model

arckit data-model "payment gateway modernization"

3. Review Output

The command creates:
  • File: projects/001-payment-gateway-modernization/ARC-001-DATA-v1.0.md
  • Summary: Shows entity counts, PII identification, GDPR status, requirements coverage

4. Next Steps (Handoffs)

HLD Review

Validate database technology choices against data model

DLD Review

Validate schema design, indexes, query patterns

SOW

Include data migration and governance in vendor RFP

Traceability

Map DR-xxx → Entities → Attributes → HLD components

Key Features

Entity-Relationship Diagram (ERD)

Generated using Mermaid syntax for GitHub-renderable diagrams. Example:

Entity Catalog

Detailed documentation for each entity (E-001, E-002, etc.):
### E-001: Customer

**Description:** Represents a registered customer who can place transactions.

**Source Requirement:** DR-001, DR-002 (Customer identity and contact storage)

**Business Owner:** CFO (Customer data governance)

**Technical Owner:** Data Platform Team

**Data Classification:** Confidential (contains PII)

**Estimated Volume:** 100,000 initial records, +10,000/month growth

**Retention Period:** 7 years after last transaction (PCI-DSS requirement)

**Attributes:**

| Attribute | Type | Required | PII | Description | Validation | Source Req |
|-----------|------|----------|-----|-------------|------------|------------|
| customer_id | UUID | Yes | No | Unique identifier | UUID v4 | DR-001 |
| email | String(255) | Yes | Yes | Contact email | RFC 5322, unique | DR-002 |
| name | String(255) | Yes | Yes | Full name | Min 2 chars | DR-002 |
| phone | String(20) | No | Yes | Contact number | E.164 format | DR-003 |
| password_hash | String(255) | Yes | No | Bcrypt hash | Bcrypt (cost 12) | DR-004 |
| created_at | Timestamp | Yes | No | Account creation | ISO 8601 | DR-001 |
| last_login | Timestamp | No | No | Last login time | ISO 8601 | DR-005 |

**Relationships:**
- One-to-Many: Customer → Transactions
- One-to-Many: Customer → PaymentMethods

**Indexes:**
- Primary Key: customer_id (clustered)
- Unique Index: email
- Index: created_at (for reporting)

**Privacy Notes (GDPR):**
- **Legal Basis:** Contract (Article 6(1)(b)) - necessary for payment processing
- **Data Subject Rights:** SAR (export all customer data), erasure (after 7-year retention), rectification
- **Encryption:** Email, name, phone encrypted at rest (AES-256)
- **Access Controls:** Customer can view/edit own data; Admins read-only; Finance read-only for disputes

PII Identification

Automatically flags Personal Identifiable Information across all entities:
GDPR Compliance is MandatoryAny entity containing PII requires:
  • Legal basis for processing (GDPR Article 6)
  • Special category conditions if applicable (GDPR Article 9)
  • Data subject rights implementation (SAR, erasure, portability)
  • Encryption at rest and in transit
  • Access controls and audit logging
  • Retention limits and deletion policies

Data Governance Matrix

Defines ownership, stewardship, and access control:
EntityData OwnerData StewardData CustodianAccess ControlSensitivityCompliance
E-001: CustomerCFOCustomer Success ManagerIT OpsRole-based (Customer, Admin, Finance)ConfidentialGDPR, PCI-DSS
E-002: TransactionCFOFinance DirectorIT OpsRole-based (Customer read-only, Finance full)RestrictedPCI-DSS, FCA
E-003: PaymentMethodCTOSecurity OfficerIT OpsTokenized (PCI compliance)RestrictedPCI-DSS Level 1

CRUD Matrix

Maps which components can Create, Read, Update, Delete each entity:
EntityPayment APIAdmin PortalReporting ServiceCRM Integration
E-001: CustomerCR—CRUD-R—-R—
E-002: TransactionCR—-R—-R—----
E-003: PaymentMethodCRU--RUD--------
E-004: RefundRequestCR—CRUD-R—-R—
Legend: C=Create, R=Read, U=Update, D=Delete, -=No access

Data Quality Framework

Defines measurable quality targets:
DimensionDefinitionTargetMeasurement
AccuracyData is correct and error-free>99%Email validation pass rate
CompletenessRequired fields are populated100%Non-null check on required fields
ConsistencySame data across systems>98%Reconciliation with CRM (daily)
TimelinessData is up-to-date<1 hourTransaction to reporting latency
UniquenessNo duplicate records100%Deduplication on email (unique index)
ValidityConforms to format/rules>99%Regex/enum validation pass rate

Requirements Traceability

Every entity and attribute traces back to DR-xxx requirements:
RequirementEntityAttributesRationale
DR-001E-001: Customercustomer_id, email, nameStore customer identity for authentication
DR-002E-002: Transactiontransaction_id, amount, currency, statusTrack payment transactions for reconciliation
DR-003E-003: PaymentMethodpayment_method_id, card_token, expirySecurely store tokenized payment methods (PCI-DSS)
NFR-SEC-003E-001: Customerpassword_hash (bcrypt)Secure authentication (bcrypt cost 12)

GDPR Compliance

The data model includes comprehensive GDPR/DPA 2018 compliance: For each entity containing PII, document the legal basis:
  • Article 6(1)(a) - Consent: User explicitly consents (e.g., marketing emails)
  • Article 6(1)(b) - Contract: Necessary for contract performance (e.g., payment processing)
  • Article 6(1)(c) - Legal Obligation: Required by law (e.g., tax records retention)
  • Article 6(1)(f) - Legitimate Interest: Necessary for legitimate interests (e.g., fraud detection)

Special Category Data (Article 9)

If processing health, biometric, ethnic, political, religious, or genetic data:
  • Document Article 9 conditions (explicit consent, employment, vital interests, etc.)
  • Flag for Data Protection Impact Assessment (DPIA) requirement

Data Subject Rights Implementation

DPIA Requirement

When is a DPIA Required?Under UK GDPR Article 35, a DPIA is required if processing is likely to result in high risk to individuals, including:
  • Systematic monitoring (e.g., CCTV, tracking)
  • Large-scale processing of special category data
  • Automated decision-making with legal/significant effects
  • Processing children’s data
  • Innovative technology (AI, biometrics, blockchain)
If your data model contains PII, run: arckit dpia <project>

Data Integration Mapping

Upstream Systems (Data Sources)

SystemEntity MappingUpdate FrequencyData Quality SLAAuthentication
Salesforce CRMCustomer → AccountReal-time (webhook)99% accuracyOAuth 2.0
Stripe PaymentsTransaction → ChargeReal-time (API polling, 1min)99.9% accuracyAPI key (secret)
SendGridCustomer → ContactBatch (daily, 2am)95% accuracyAPI key

Downstream Systems (Data Consumers)

SystemEntity MappingSync MethodLatency SLAData Format
Reporting WarehouseAll entitiesBatch (hourly, CDC)<1 hourParquet
Customer PortalCustomer, TransactionAPI (real-time)<2 secondsJSON
CRM AnalyticsCustomer, TransactionBatch (nightly)<12 hoursCSV

Master Data Management (MDM)

EntitySource of TruthRationale
CustomerPayment APICustomers created during checkout
TransactionPayment APITransactions originate in payment service
ProductProduct Catalog ServiceExternal product database (upstream)

Database Technology Recommendations

Based on data model characteristics:

Relational (PostgreSQL, MySQL)

Best for: Transactional data, strong consistency, ACID guaranteesUse when: Complex relationships, financial data, PCI-DSS complianceExample: Payment transactions, customer accounts

Document (MongoDB, DynamoDB)

Best for: Flexible schemas, rapid iteration, nested dataUse when: Product catalogs, user profiles, CMS contentExample: Customer preferences, product metadata

Graph (Neo4j, Amazon Neptune)

Best for: Highly connected data, social graphs, recommendationsUse when: Friend networks, fraud detection, knowledge graphsExample: Customer relationships, recommendation engines

Time-Series (InfluxDB, TimescaleDB)

Best for: Metrics, events, IoT data, logsUse when: High write throughput, time-based queries, retention policiesExample: Transaction metrics, API logs, sensor data

UK Government Data Compliance

For public sector projects:

Government Security Classifications

  • OFFICIAL: Routine business data (default)
  • OFFICIAL-SENSITIVE: Personal data, policy development
  • SECRET: Very sensitive information (requires accreditation)
  • TOP SECRET: Highest sensitivity (rare)

Data Standards

  • Use GDS Data Standards Catalogue where applicable
  • Prefer open data formats (JSON, CSV, OData)
  • Reference ICO Data Protection guidance for public sector
  • Follow NCSC data security patterns

National Data Strategy Alignment

UK National Data StrategyThe data model supports:
  • Data Foundations pillar: Metadata standards, data quality, data cataloging
  • Data Availability pillar: Data access controls, sharing agreements, open data
See National Data Strategy Guide for full mapping.

Document Structure

The generated document includes:
# Data Model Document

## Document Control
- Document ID: ARC-001-DATA-v1.0
- Version: 1.0
- Status: DRAFT

## Executive Summary
- Total Entities: 12
- PII Entities: 4
- GDPR Compliance: DPIA Required

## Visual ERD (Mermaid)
[Entity-relationship diagram]

## Entity Catalog
### E-001: Customer
### E-002: Transaction
[...]

## Data Governance Matrix
[Ownership, stewardship, access controls]

## CRUD Matrix
[Component access permissions]

## Data Integration Mapping
[Upstream/downstream systems]

## GDPR Compliance
[Legal basis, special category data, data subject rights]

## Data Quality Framework
[Quality dimensions, metrics, targets]

## Requirements Traceability
[DR-xxx → Entity → Attribute mapping]

## Implementation Guidance
[Database technology, schema migration, backup/recovery]

Real-World Example

Project: Payment Gateway Modernization (Project 001)Entities: 8 entities modeled
  • Core Entities: Customer, Transaction, PaymentMethod
  • Supporting Entities: RefundRequest, AuditLog
  • Lookup/Reference Data: Currency, PaymentStatus, TransactionType
Relationships: 12 relationships defined
  • One-to-Many: 8 (e.g., Customer → Transactions)
  • Many-to-Many: 2 (e.g., Transaction ↔ PaymentMethod via junction table)
  • One-to-One: 2 (e.g., Transaction → AuditLog)
Attributes: 67 total attributes
  • PII Attributes: 12 (email, name, phone, address)
  • Encrypted Attributes: 15 (PII + sensitive financial data)
  • Indexed Attributes: 22 (primary keys, foreign keys, performance)
GDPR Compliance:
  • PII Entities: Customer (name, email, phone), Transaction (billing address)
  • Legal Basis: Contract (Article 6(1)(b)) - payment processing
  • DPIA Required: YES (large-scale payment card processing, PCI-DSS Level 1)
  • Retention Periods: 7 years (PCI-DSS), 6 years (UK tax law)
Data Governance:
  • Data Owners: CFO (financial data), CTO (technical data), DPO (PII)
  • CRUD Matrix: 4 roles defined (Customer, Admin, Finance, Support)
  • Access Controls: Role-based + attribute-based (customers see own data only)
Compliance:
  • PCI-DSS Level 1 (payment card tokenization, no plaintext storage)
  • GDPR/DPA 2018 (PII encryption, SAR, erasure, portability)
  • FCA regulations (financial services, UK)
Requirements Traceability:
  • Data Requirements Mapped: 8 DR-xxx requirements
  • Unmapped Requirements: 0
Next Steps:
  • Run /arckit dpia 001 for Data Protection Impact Assessment
  • Run /arckit research database-technologies for technology selection
  • Run /arckit hld-review after HLD is created

Tips & Best Practices

Data Minimization (Privacy by Design)Collect only the minimum data necessary for the purpose:
  • ❌ Store full credit card numbers
  • ✅ Store tokenized payment method IDs
  • ❌ Store date of birth for age verification
  • ✅ Store age bracket (18-25, 26-35, etc.)
PCI-DSS Compliance for Payment DataIf storing payment card data:
  • NEVER store CVV/CVC security codes
  • NEVER store full magnetic stripe data
  • ALWAYS tokenize card numbers (use payment processor tokens)
  • ALWAYS encrypt cardholder data at rest (AES-256)
  • ALWAYS use TLS 1.2+ for transmission
Retention Policies Must Be EnforcedDefine retention periods per entity and implement automated deletion:
  • Marketing data: 2 years after last interaction
  • Transaction records: 7 years (PCI-DSS, UK tax law)
  • Audit logs: 1 year (security compliance)
  • Test data: Anonymize after 30 days

Quality Checks

Before generating the document, ArcKit validates:

Requirements

Prerequisite: Run before data-model to create DR-xxx

DPIA

Next step: Assess data protection impact if PII

DataScout

Discovery: Find external data sources to integrate

HLD Review

Downstream: Validate database technology choices

DLD Review

Downstream: Validate schema design and indexes

Traceability

Downstream: Map DR-xxx → Entity → HLD component

Additional Resources

Build docs developers (and LLMs) love