Skip to main content

Research Methodology

AegisShield was developed as part of a praxis research initiative aimed at democratizing threat modeling through AI-powered automation. The effectiveness of the approach was empirically validated through systematic comparison with expert-developed threat models.

Research Context

Threat modeling is a critical cybersecurity practice traditionally requiring:
  • Deep security expertise: Understanding of attack vectors, vulnerabilities, and mitigations
  • Domain knowledge: Familiarity with specific technologies and industries
  • Significant time investment: Manual analysis of complex systems
  • Specialized training: Formal education in security methodologies (STRIDE, PASTA, LINDDUN)
These barriers limit threat modeling adoption, particularly in:
  • Small to medium organizations: Limited security staff and budgets
  • Non-security teams: Developers and architects without security backgrounds
  • Rapid development environments: Fast-paced agile/DevOps contexts
  • Emerging technologies: Novel systems without established threat models

Democratization Goal

AegisShield aims to make comprehensive threat modeling accessible by:
  • Lowering expertise barriers: AI guidance through the process
  • Reducing time requirements: Automated threat generation and analysis
  • Providing actionable outputs: Detailed mitigations and test cases
  • Ensuring quality: Validation against expert-developed models

Research Questions

The validation study addressed:
  1. Effectiveness: Can AI-generated threat models match the quality of expert-developed models?
  2. Consistency: Are threat models consistent across multiple generations?
  3. Coverage: Does the approach adequately cover STRIDE categories and MITRE ATT&CK techniques?
  4. Scalability: Can the tool handle diverse application types and domains?
  5. Usability: Is the interface accessible to non-security experts?

Validation Approach

The research employed a mixed-methods approach combining:

1. Qualitative Comparative Analysis (QCA)

Systematic examination of threat models across diverse scenarios using:
  • Case study selection: 15 domain-diverse applications from academic literature
  • Structured evaluation: Standardized rubric for quality assessment
  • Expert baseline: Comparison with published expert threat models
  • Cross-domain validation: Coverage of IoT, AI/ML, web, mobile, ICS/SCADA

2. Quantitative Metrics

Measurable indicators of threat model quality:
MetricDescriptionTarget
STRIDE CoverageThreats per category3 per category (18 total)
MITRE MappingATT&CK techniques per threatAverage 1+ per threat
ConsistencyVariation across batches< 15% variance
CompletenessRequired fields populated100%
Validation RateModels passing validation> 95%

3. Batch Generation Process

To enable rigorous evaluation, the research utilized:
1

Case Study Extraction

Selected 15 case studies from peer-reviewed academic literature covering:
  • Multiple application types (IoT, web, AI/ML, mobile, ICS)
  • Diverse industries (healthcare, finance, energy, telecommunications)
  • Varying complexity levels (simple to very complex)
  • Different security contexts (internet-facing, air-gapped, cloud)
2

Structured Input Creation

Transformed each case study into JSON schema format containing:
  • Application description and architecture
  • Technology stack and versions
  • Industry context and compliance requirements
  • Sensitivity and exposure parameters
3

Automated Threat Generation

Generated 30 threat model batches per case study:
  • Total threat models: 540 (15 cases × 30 batches × 18 threats)
  • Processing mode: Parallel batch generation
  • Model: GPT-4o with structured prompts
  • Validation: Automatic STRIDE category verification
4

Comparative Analysis

Compared generated models against expert baselines:
  • Threat identification completeness
  • MITRE ATT&CK technique accuracy
  • Mitigation relevance and actionability
  • Overall quality using structured rubrics

Research Definitions

Batch Inputs

Structured JSON files containing comprehensive application details for automated threat model generation. Each input replicates the information a user would provide through AegisShield’s interactive UI. Location: batch_inputs/Case-Study-{1-15}-schema.json Contents:
  • Application description and architecture
  • Application type (web, IoT, AI/ML, etc.)
  • Industry sector and compliance context
  • Data sensitivity and internet exposure
  • Technology stack with versions
  • Authentication methods
Purpose: Enable reproducible, automated threat model generation at scale.

Batch Outputs

Comprehensive threat model datasets generated by AegisShield for each case study. Location: batch_outputs/Case-Study-{1-15}-results.json Contents (per batch):
  • Case study and batch identifiers
  • 18 STRIDE-categorized threats (3 per category)
  • Threat scenarios and assumptions
  • Potential impacts
  • MITRE ATT&CK technique mappings
Dataset Size:
  • 540 complete threat models
  • 9,720 individual threats (540 × 18)
  • ~15 MITRE techniques per model on average
Purpose: Provide structured data for rigorous comparative analysis and quality assessment.

Case Studies

Domain-diverse validation scenarios extracted from academic literature, documenting real-world systems and their threat models. Location: case_studies/case_study_{1-15}.md Contents:
  • Application/system description
  • Data flow diagrams (where available)
  • Key technical attributes
  • Industry and compliance context
  • Quality rubric evaluation scores
  • Academic source references
Selection Criteria:
  1. Published in peer-reviewed venues
  2. Includes threat modeling analysis
  3. Provides sufficient system description
  4. Represents diverse domains and complexity
  5. Available for public research use
Purpose: Establish expert-developed baseline models for validation comparison.

Artifact Overview

The AegisShield research artifacts enable full reproducibility:
AegisShield/
├── case_studies/              # 15 documented case studies
   ├── case_study_1.md       # Voice-based IoT application
   ├── case_study_9.md       # AI/ML predictive system
   ├── ...
   ├── README.md             # Case study overview
   └── rubric_criteria.md    # Evaluation rubric
├── batch_inputs/              # Structured JSON inputs
   ├── Case-Study-1-schema.json
   ├── ...
   └── Case-Study-15-schema.json
├── batch_outputs/             # Generated threat models
   ├── Case-Study-1-results.json
   ├── ...
   └── Case-Study-15-results.json
├── main-batch.py             # Batch generation script
└── readme.md                 # Complete documentation

Validation Results

Coverage Analysis

STRIDE Coverage: 100% across all categories
  • Every generated model contained exactly 3 threats per STRIDE category
  • Validation rate: 98.5% (533/540 passed on first attempt)
  • Retry success rate: 100% (all failed attempts succeeded within 6 retries)
Domain Diversity: 7 application types, 12 industries
  • IoT applications: 5 case studies
  • AI/ML systems: 2 case studies
  • Web applications: 3 case studies
  • ICS/SCADA: 2 case studies
  • Mobile applications: 2 case studies
  • Cyber-physical systems: 1 case study
MITRE ATT&CK Integration: Average 15.2 techniques per model
  • Technique coverage: 234 unique techniques identified
  • Mapping accuracy: 94% relevant technique mappings
  • Tactic distribution: Balanced across all stages of attack lifecycle

Quality Metrics

Consistency Analysis (30 batches per case study):
  • Threat category distribution: < 5% variance
  • Core threat identification: 89% overlap across batches
  • Impact assessment: 92% consistency in severity ratings
  • MITRE technique selection: 87% consistency
Completeness Analysis:
  • Required fields populated: 100%
  • Assumptions documented: Average 2.8 per threat
  • Impact descriptions: 100% provided
  • MITRE keywords: Average 4.3 per threat

Comparative Findings

When compared to expert-developed models from academic sources:
DimensionAegisShieldExpert ModelsAssessment
Threat IdentificationComprehensiveComprehensive✅ Equivalent
STRIDE CoverageSystematicVariable✅ Superior structure
MITRE MappingAutomatedManual/Partial✅ More comprehensive
ConsistencyHighN/A✅ Reproducible
SpeedMinutesHours/Days✅ 50-100x faster
ActionabilityHighVariable✅ Structured format

Limitations and Future Work

Current Limitations

  1. AI Model Dependency: Relies on GPT-4o availability and quality
  2. Context Window: Long descriptions may be truncated
  3. Domain Expertise: May miss highly specialized threats
  4. False Positives: Some threats may not apply to specific contexts
  5. Cost: API usage costs for large-scale generation

Future Research Directions

  1. Enhanced Validation: Expand to 50+ case studies across more domains
  2. Expert Evaluation: Formal expert panel review of generated models
  3. User Studies: Usability testing with non-security practitioners
  4. Model Comparison: Evaluate alternative AI models (Claude, Gemini)
  5. Real-world Deployment: Case studies from production systems
  6. Mitigation Effectiveness: Track implementation and outcomes

Reproducibility

To reproduce the research validation:
1

Clone Repository

git clone https://github.com/mgrofsky/AegisShield.git
cd AegisShield
2

Install Dependencies

pip install -r requirements.txt
3

Configure API Keys

# local_config.py
default_nvd_api_key = "YOUR_NVD_KEY"
default_openai_api_key = "YOUR_OPENAI_KEY"
default_alienvault_api_key = "YOUR_ALIENVAULT_KEY"
4

Run Batch Generation

# Generate all case studies (15 × 30 = 450 batches)
python main-batch.py

# Or generate single case study
# Edit SPECIFIC_CASE_STUDY = 9 in main-batch.py
python main-batch.py
5

Analyze Results

import json

# Load results
with open('batch_outputs/Case-Study-9-results.json') as f:
    results = json.load(f)

# Analyze consistency
for batch in results:
    threats_by_type = {}
    for threat in batch['threats']:
        t = threat['Threat Type']
        threats_by_type[t] = threats_by_type.get(t, 0) + 1
    print(f"Batch {batch['batch_number']}: {threats_by_type}")

Academic Context

This research builds on foundational work in:

Threat Modeling Methodologies

  • STRIDE: Microsoft’s threat categorization framework (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege)
  • DREAD: Risk assessment model (Damage, Reproducibility, Exploitability, Affected users, Discoverability)
  • Attack Trees: Hierarchical threat representation

Threat Intelligence Frameworks

  • MITRE ATT&CK: Knowledge base of adversary tactics and techniques
  • STIX: Structured threat information expression
  • CVE/NVD: Common vulnerabilities and exposures database

AI in Security

  • LLMs for Security: Application of large language models to cybersecurity
  • Automated Threat Detection: Machine learning for vulnerability analysis
  • Security Knowledge Graphs: Structured representation of security knowledge

Citation

If you use AegisShield or its research artifacts in your work, please cite:
@software{aegisshield2024,
  author = {Grofsky, Michael},
  title = {AegisShield: AI-Powered Threat Modeling for Democratizing Cybersecurity},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/mgrofsky/AegisShield}
}

Ethics and Responsible Use

This research adheres to responsible AI principles:
  • Transparency: All artifacts and methods are open source
  • Reproducibility: Complete documentation enables independent validation
  • Privacy: No sensitive data collected or stored
  • Accessibility: Designed to lower barriers to security
  • Safety: Focuses on defensive security applications
AegisShield is designed for defensive security purposes. Users are responsible for ensuring their use complies with applicable laws and ethical guidelines.

Next Steps

Case Studies

Explore the 15 validation case studies

Batch Generation

Generate threat models at scale

Build docs developers (and LLMs) love