Batch Threat Model Generation
Themain-batch.py script enables automated generation of multiple threat models for research purposes, validation studies, and comparative analysis. This tool was developed to facilitate large-scale empirical validation of AegisShield’s threat modeling capabilities.
Overview
Batch generation allows you to:- Automate creation of multiple threat models for case studies
- Generate structured outputs for rigorous comparative analysis
- Conduct research validation across diverse scenarios
- Produce consistent threat models at scale
How It Works
The batch script programmatically replicates the interactive UI workflow:Load Application Details
Reads structured JSON input files from
batch_inputs/ directory containing:- Application description
- Technology stack and versions
- Industry sector and compliance requirements
- Authentication methods
- Sensitivity and exposure parameters
Fetch Threat Intelligence
Automatically retrieves data from:
- AlienVault OTX: Industry-specific threat intelligence
- NVD: Technology-specific vulnerabilities
- MITRE ATT&CK: Tactics and techniques (fetched during threat generation)
Generate Threat Models
For each batch iteration:
- Creates threat model prompt with context
- Calls GPT-4o to generate 18 STRIDE-based threats (3 per category)
- Validates threat model structure
- Fetches and processes MITRE ATT&CK mappings
Batch Input Structure
Input files are JSON schemas located inbatch_inputs/ directory:
Key Fields
| Field | Description | Example |
|---|---|---|
app_input | Comprehensive application description | ”The system is a voice-based application with IoT integration…” |
app_type | Type of application | ”Web Application”, “IoT Application”, “AI/ML Application” |
industry_sector | Industry domain | ”Healthcare”, “Finance”, “Telecommunications” |
sensitive_data | Data sensitivity level | ”High”, “Medium”, “Low” |
internet_facing | Internet exposure | ”Yes”, “No” |
technical_ability | Organization’s security maturity | ”High”, “Medium”, “Low” |
selected_technologies | Technology CPE mappings | {"PostgreSQL": "cpe:2.3:a:postgresql:postgresql"} |
selected_versions | Technology versions | {"PostgreSQL": "14.0"} |
Batch Output Structure
Output files are saved inbatch_outputs/ as JSON arrays:
Configuration
Edit the global variables inmain-batch.py:
Running Batch Generation
Research Use Cases
1. Validation Studies
Generate multiple threat models for the same application to:- Assess consistency of AI-generated threats
- Compare against expert-developed models
- Measure quality metrics across iterations
2. Comparative Analysis
Analyze threat models across:- Different application types (web, IoT, AI/ML)
- Various industry sectors (healthcare, finance, government)
- Multiple complexity levels
3. Data Collection
Build datasets for:- Machine learning model training
- Threat pattern analysis
- Security metric development
4. Performance Testing
Evaluate:- API response times
- Error rates and retry logic
- Parallel processing efficiency
Validation Logic
The script validates each generated threat model:retries times.
Error Handling
Errors are logged toerror_log.txt:
Common Issues
API Rate Limiting
API Rate Limiting
Problem: OpenAI API rate limits exceededSolution:
- Reduce
workersto 1 or 2 - Increase retry delays in exponential backoff
- Use a higher-tier API key with increased limits
Invalid Threat Models
Invalid Threat Models
Problem: Generated models don’t meet validation criteriaSolution:
- Check API key and model availability
- Verify input JSON structure is correct
- Review threat model prompt for clarity
- Increase
retriesfor more attempts
Missing MITRE Data
Missing MITRE Data
Problem: MITRE ATT&CK data not fetchedSolution:
Memory Issues
Memory Issues
Problem: Out of memory with parallel processingSolution:
- Reduce
workersto 1-2 - Process case studies individually
- Increase system swap space
Performance Optimization
Parallel Processing
The script usesThreadPoolExecutor for concurrent batch generation:
- workers=3: Balanced performance for most systems
- workers=1: Maximum reliability, slower processing
- workers=5+: Faster but may hit API rate limits
Retry Strategy
Exponential backoff for failed operations:Output Analysis
After batch generation, analyze results:Research Context
For the AegisShield validation study, batch generation was used to:- Generate 30 batches across 15 case studies
- Produce 540 total threat models (30 × 18 threats each)
- Enable systematic comparison with expert-developed models
- Support Qualitative Comparative Analysis (QCA)
- Validate AI threat modeling effectiveness
Next Steps
Case Studies
Explore the 15 case studies used for validation
Research Methodology
Learn about the empirical validation approach