Overview
Thedatascout command discovers external data sources — APIs, datasets, open data portals, and commercial data providers — that can fulfill your project’s data and integration requirements. It prioritizes UK Government open data (data.gov.uk, api.gov.uk) and evaluates sources using weighted scoring across requirements fit, data quality, licensing, API quality, compliance, and reliability.
Command Syntax
Prerequisites
Mandatory
- Requirements Document (REQ): Must contain Data Requirements (DR-xxx) or Integration Requirements (INT-xxx)
- Command:
arckit requirements <project> - The tool will warn if missing
- Command:
Recommended
- Data Model (DATA): Identifies entities and attributes needing external data
- Stakeholder Analysis (STKE): Identifies data ownership and governance needs
- Architecture Principles (PRIN): Data standards, open data preferences
Workflow
1. Generate Requirements and Data Model
2. Discover Data Sources
- Extract data needs from requirements (DR-xxx, INT-xxx, FR-xxx, NFR-xxx)
- Check UK Government sources FIRST (api.gov.uk, data.gov.uk)
- Research commercial APIs (Google, AWS, Azure)
- Research free/freemium APIs (OpenWeather, HERE)
- Research open datasets (Kaggle, data.world)
- Evaluate sources with weighted scoring
- Identify gaps and data utility opportunities
3. Review Output
The command creates:- File:
projects/001-traffic-monitoring/research/ARC-001-DSCT-v1.0.md - Summary: Categories researched, sources discovered, UK Gov sources, top recommendations, gaps
4. Next Steps (Handoffs)
Data Model
Add discovered sources as new entities/attributes
Research
Research data source pricing and vendor selection
ADR
Record data source selection decisions
DPIA
Assess third-party sources containing personal data
Diagram
Create data flow diagrams showing integrations
Traceability
Map DR-xxx requirements to discovered sources
Data Source Categories
1. UK Government Open Data (Priority)
Key UK Government Data Sources:| Source | Description | Example Use Cases |
|---|---|---|
| api.gov.uk | Central API catalog for government services | Companies House, DVLA, Land Registry, HMRC |
| data.gov.uk | 50,000+ open datasets from government departments | Census data, crime statistics, health data |
| Companies House API | Company information (free) | Business intelligence, compliance checks |
| Ordnance Survey | Geospatial data, maps (free tier + premium) | Location services, mapping, routing |
| NHS Digital | Healthcare data, hospital statistics (anonymized) | Health analytics, service planning |
| DfT Traffic API | Real-time traffic flow, congestion (free) | Traffic monitoring, route optimization |
| Met Office DataPoint | Weather forecasts, observations (free tier) | Weather-based planning, alerts |
| ONS Open Data | Census, population, economic statistics (free) | Demographics, market research |
| Police UK API | Crime data by location (free) | Safety analytics, risk assessment |
| Land Registry | Property ownership, prices (premium) | Property valuations, due diligence |
2. Commercial APIs
Paid APIs from vendors (Google, AWS, Azure, Stripe, Twilio): Pros: Enterprise SLAs, global coverage, advanced features, dedicated support Cons: Licensing costs, vendor lock-in, compliance complexity (data residency, GDPR) Examples:- Google Maps API: Geocoding, routing, Places (pay-per-request)
- Stripe API: Payment processing (transaction fees)
- Twilio API: SMS, voice, WhatsApp (pay-per-message)
- AWS Data Exchange: 3,500+ third-party datasets (subscription)
3. Free/Freemium APIs
Free tiers with usage limits, upgrade to paid for scale: Pros: Low cost for prototyping, easy to test Cons: Rate limits, no SLA, limited support, free tier may be discontinued Examples:- OpenWeather API: Weather data (60 calls/min free, then paid)
- REST Countries API: Country data (unlimited free)
- CurrencyLayer API: Exchange rates (1,000 requests/month free)
- HERE Maps API: Geocoding, routing (250,000 transactions/month free)
4. Open Datasets
Public datasets (CSV, JSON, Parquet) for download: Pros: Free, no API rate limits, full data ownership Cons: No real-time updates, requires ETL pipeline, data quality varies Examples:- Kaggle: 50,000+ datasets (machine learning, analytics)
- data.world: Collaborative data catalog (government, research, business)
- AWS Open Data: Climate, genomics, satellite imagery (S3 buckets)
- GitHub Awesome Public Datasets: Curated list of 1,000+ datasets
Evaluation Criteria (Weighted Scoring)
DataScout evaluates sources using weighted scoring:| Criterion | Weight | Description | Scoring |
|---|---|---|---|
| Requirements Fit | 30% | How well does it meet DR-xxx/INT-xxx requirements? | 0-10 (10 = perfect match) |
| Data Quality | 25% | Accuracy, completeness, timeliness, freshness | 0-10 (10 = official government data) |
| License & Cost | 20% | Open license (OGL, CC-BY), free vs paid, cost predictability | 0-10 (10 = free, open license) |
| API Quality | 15% | REST/GraphQL, documentation, rate limits, authentication | 0-10 (10 = RESTful, well-documented) |
| Compliance | 5% | GDPR, data residency (UK/EU), security certifications | 0-10 (10 = UK-hosted, GDPR-compliant) |
| Reliability | 5% | Uptime SLA, vendor reputation, longevity | 0-10 (10 = 99.9% SLA, government source) |
Gap Analysis
DataScout identifies gaps where requirements cannot be fulfilled: Example:Data Utility Analysis
DataScout identifies data utility beyond primary requirements — additional value from discovered sources: Example:Document Structure
The generated document includes:Real-World Example
Traffic Monitoring Dashboard - DataScout Summary
Traffic Monitoring Dashboard - DataScout Summary
Project: Traffic Monitoring Dashboard (Project 002)Data Needs Extracted:
- DR-003: Real-time traffic flow (speed, volume, congestion)
- DR-004: Historical traffic patterns (for trend analysis)
- INT-001: Geolocation API (map visualization)
- INT-002: Weather data (correlate traffic with weather)
- UK Government Open Data: 3 sources
- Commercial APIs: 2 sources
- Free/Freemium APIs: 1 source
- Open Datasets: 1 source
- DfT Real-Time Traffic API (UK Government) - Score: 92/100
- Category: UK Gov Open Data
- Cost: Free (Open Government Licence)
- Coverage: Major roads, motorways (England)
- Update Frequency: Every 5 minutes
- API Quality: RESTful, JSON, 600 requests/hour
- Requirements Fulfilled: DR-003 (real-time traffic)
- Recommendation: ✅ PRIMARY SOURCE
- Ordnance Survey Maps API (UK Government) - Score: 88/100
- Category: UK Gov Open Data (Premium tier)
- Cost: Free tier (10,000 requests/month), then £500/month
- Coverage: UK-wide maps, geolocation, routing
- API Quality: RESTful, excellent docs, 600 requests/min
- Requirements Fulfilled: INT-001 (geolocation)
- Recommendation: ✅ RECOMMENDED (free tier sufficient for MVP)
- Met Office DataPoint API (UK Government) - Score: 85/100
- Category: UK Gov Open Data
- Cost: Free tier (5,000 requests/day), then £200/month
- Coverage: UK-wide weather forecasts, observations
- Update Frequency: Hourly
- Requirements Fulfilled: INT-002 (weather data)
- Recommendation: ✅ RECOMMENDED
- DfT Road Safety Data (UK Government Open Dataset) - Score: 78/100
- Category: UK Gov Open Dataset (CSV download)
- Cost: Free
- Coverage: Accident data (2010-2023)
- Update Frequency: Annual
- Requirements Fulfilled: DR-004 (historical patterns)
- Data Utility: Identify high-accident zones for route optimization
- Recommendation: ✅ ADD TO DATA MODEL
- Road Safety Data: Can identify high-accident zones → recommend safer routes (beyond primary requirement)
- Census Demographics: Link traffic patterns to demographics (commuter vs leisure traffic)
- Add entity: E-005: TrafficFlow (source: DfT API)
- Add entity: E-006: WeatherCondition (source: Met Office API)
- Add entity: E-007: AccidentHotspot (source: DfT Road Safety Dataset)
- Add attributes: E-001: Location (census_area_id for demographics)
- Run
/arckit data-model 002to add new entities - Run
/arckit adr dft-traffic-apito document API selection decision - Run
/arckit dpia 002(Met Office API may contain location PII) - Register for DfT API key: https://www.api.gov.uk/dft/traffic
- Implement API integration (rate limit: 600 requests/hour)
Tips & Best Practices
Free Tier LimitsFree/freemium APIs often have:
- Rate limits: 100-1,000 requests/hour (may not scale to production)
- No SLA: Service may be down without notice
- Feature limitations: Premium features (geocoding, historical data) locked
- Discontinuation risk: Free tier may be discontinued (e.g., Google Maps pricing changes)
Quality Checks
Before generating the document, ArcKit validates:Related Commands
Requirements
Prerequisite: Run before datascout to identify DR-xxx/INT-xxx
Data Model
Next step: Add discovered sources as entities
DPIA
Next step: Assess third-party sources with PII
Research
Integration: Research vendor pricing and selection
ADR
Integration: Document data source selection decisions
Diagram
Downstream: Create data flow diagrams
Additional Resources
- api.gov.uk - UK Government API Catalog
- data.gov.uk - UK Government Open Data Portal
- Companies House API
- Ordnance Survey APIs
- Met Office DataPoint
- DfT Traffic Data
- ONS Open Data
- Open Government Licence