Overview
RAPTOR’s static analysis engine combines local security rules with Semgrep’s community packs for comprehensive code scanning. The scanner executes rules in parallel for improved performance and supports policy-based rule selection.Architecture
The scanner is located atpackages/static-analysis/scanner.py and orchestrates:
- Parallel rule execution with configurable worker pools
- Policy group selection for targeted scanning
- SARIF output format for standardized reporting
- Automatic deduplication across multiple rule sources
- Repository validation with safe git cloning
Policy Groups
Available Groups
RAPTOR organizes rules into policy groups that map to both local rules and Semgrep registry packs:| Group | Local Rules | Registry Pack | Focus |
|---|---|---|---|
crypto | Custom cryptography rules | category/crypto | Weak algorithms, key management |
secrets | Secret detection patterns | p/secrets | API keys, credentials, tokens |
injection | Injection vulnerability rules | p/command-injection | Command, SQL, LDAP injection |
auth | Authentication patterns | p/jwt | JWT issues, session handling |
ssrf | SSRF detection | p/ssrf | Server-side request forgery |
deserialisation | Unsafe deserialization | p/insecure-deserialization | Pickle, YAML, JSON issues |
logging | Logging security | p/logging | Log injection, sensitive data |
filesystem | Path traversal | p/path-traversal | Directory traversal |
flows | Dataflow analysis | p/default | Taint tracking |
sinks | Dangerous sinks | p/xss | XSS, dangerous functions |
all | All groups | All packs | Comprehensive scan |
Baseline Packs
These packs are always included regardless of policy group selection:CLI Usage
Basic Scan
Scan a repository with default crypto rules:Git Repository Clone
Scan a remote repository (clones automatically):Multiple Policy Groups
Combine multiple policy groups:Comprehensive Scan
Run all available policy groups:Sequential Mode
Disable parallel scanning (useful for debugging):Preserve Working Directory
Keep temporary clone directory for inspection:Parallel Execution
Worker Pool Configuration
The scanner uses a configurable thread pool:Performance Benefits
Parallel execution provides significant speedup:- 4 workers: 3-4x faster than sequential
- Per-rule timeout: 120 seconds (configurable)
- Total timeout: 900 seconds (15 minutes)
SARIF Output Format
Output Structure
Each scan produces multiple SARIF files:SARIF Schema
RAPTOR validates all SARIF output against the official schema:Merged Output
The scanner automatically merges and deduplicates findings:Scan Metrics
Generated Metrics
Every scan produces comprehensive metrics:Accessing Metrics
Repository Validation
URL Validation
Only trusted repository patterns are allowed:Safe Git Clone
Cloning uses restricted environment and timeouts:Configuration Examples
Custom Rule Directory
Add your own Semgrep rules:Environment Configuration
Integration with RAPTOR Pipeline
Automatic Invocation
Static analysis runs automatically in/agentic mode:
Phase Integration
The scanner is Phase 1 of the autonomous pipeline:- Static Analysis (scanner.py) → SARIF findings
- Exploitability Validation → Confirmed vulnerabilities
- LLM Analysis → Root cause analysis
- Exploit Generation → Proof-of-concept code
Output Consumption
SARIF output feeds downstream tools:Troubleshooting
Empty SARIF Output
If a scan produces no results:Timeout Issues
Increase timeouts for large codebases:Validation Failures
If SARIF validation fails:Best Practices
Parallel vs Sequential: Use
--sequential only for debugging. Parallel mode is 3-4x faster with no loss of accuracy.See Also
- CodeQL Analysis - Deep semantic analysis
- Exploitability Validation - Verify findings are exploitable
- Binary Fuzzing - Dynamic testing with AFL++