Skip to main content

Overview

Generate a comprehensive Operational Readiness Pack that prepares a service for production operation. Bridges the gap between development completion and live service, ensuring operations teams have everything needed to support the service.

When to Use

Use /arckit.operationalize before go-live to ensure operational readiness. Run after:
  1. /arckit.requirements - for SLA targets
  2. /arckit.diagram - for component inventory
  3. /arckit.hld-review or /arckit.dld-review - for technical details
  4. /arckit.data-model - for data dependencies
Complementary to /arckit.servicenow (ITSM tooling) - this command focuses on operational practices and documentation.

Usage

/arckit.operationalize Payments API - Critical tier, 24/7 support
/arckit.operationalize 001 - Standard tier, business hours support

What It Creates

File: projects/{project}/ARC-{PID}-OPER-v1.0.md Sections:
  1. Service Overview - Purpose, criticality, SLA summary
  2. Support Model - Support tiers, escalation paths, on-call schedule
  3. SLIs and SLOs - Service Level Indicators, Objectives, Error Budgets
  4. Runbooks - Standard operating procedures for common tasks
  5. Incident Response - Playbooks for incident categories
  6. Monitoring and Alerting - Metrics, dashboards, alert thresholds
  7. Disaster Recovery - Backup/restore procedures, RTO/RPO
  8. Business Continuity - Failover plans, degraded mode operation
  9. Change Management - Deployment procedures, rollback plans
  10. Capacity Management - Scaling triggers, resource planning
  11. Handover Documentation - Onboarding guides, knowledge transfer
  12. Toil Analysis - Repetitive tasks, automation opportunities

Service Tiers

TierAvailability TargetSupport HoursMax Incident Response
Critical99.95%24/7/36515 minutes
Important99.5%16/51 hour
Standard99.0%9/54 hours

SRE Principles

Follows Site Reliability Engineering best practices:
  • SLIs - Quantitative service quality measures (latency, availability, error rate)
  • SLOs - Target values for SLIs (e.g., 99.9% availability)
  • Error Budgets - Acceptable downtime to balance reliability vs velocity
  • Toil Reduction - Automate repetitive manual operational work

UK Government Compliance

  • GDS Service Standard Point 14 - Operate a reliable service
  • TCoP Point 6 - Make things secure (operational security)
  • ITIL v4 - Incident, Problem, Change, Capacity Management

ServiceNow

ITSM tooling design (CMDB, SLAs, incidents)

DevOps

CI/CD pipelines and automation

FinOps

Cloud cost management and optimization

Diagram

Architecture diagrams for runbooks

Build docs developers (and LLMs) love