Overview
The Semgrep Rule Variant Creator plugin takes an existing Semgrep rule and one or more target languages, then generates independent rule variants for each applicable language. Each variant goes through a complete 4-phase cycle ensuring quality and correctness. Key capabilities:- Applicability analysis before porting
- Independent 4-phase cycle per language
- Test-first methodology for each variant
- Language-specific idiom adaptation
- Proper validation before proceeding
Installation
Prerequisites
- Semgrep installed and available in PATH
- Existing Semgrep rule to port (in YAML format)
- Target languages specified
When to Use
Use this plugin when you need to:- Port an existing Semgrep rule to one or more target languages
- Create language-specific variants of a universal vulnerability pattern
- Expand rule coverage across a polyglot codebase
- Translate rules between languages with equivalent constructs
When NOT to Use
Do NOT use this plugin for:- Creating a new Semgrep rule from scratch (use
semgrep-rule-creatorinstead) - Running existing rules against code
- Languages where the vulnerability pattern fundamentally doesn’t apply
- Minor syntax variations within the same language
Input Specification
This plugin requires:- Existing Semgrep rule - YAML file path or YAML rule content
- Target languages - One or more languages to port to
Output Structure
For each applicable target language, produces:Example Output
Input:- Rule:
python-command-injection.yaml - Target languages: Go, Java
Four-Phase Workflow
Each target language goes through an independent 4-phase cycle:Phase 1: Applicability Analysis
Determine if the vulnerability pattern applies to the target language before proceeding.Analysis criteria:
- Does the vulnerability class exist in the target language?
- Does an equivalent construct exist (function, pattern, library)?
- Are the semantics similar enough for meaningful detection?
APPLICABLE→ Proceed with variant creationAPPLICABLE_WITH_ADAPTATION→ Proceed but significant changes neededNOT_APPLICABLE→ Skip this language, document why
Phase 2: Test Creation
Write tests BEFORE the rule using target language idioms.Create test file with:
- Minimum 2 vulnerable cases (
ruleid:) - Minimum 2 safe cases (
ok:) - Language-specific edge cases
Phase 3: Rule Creation
Translate the original rule to the target language.
- Analyze AST:
semgrep --dump-ast -l <lang> test-file - Translate patterns to target language syntax
- Update metadata: language key, message, rule ID
- Adapt for idioms: Handle language-specific constructs
Complete the full 4-phase cycle for each language before moving to the next. Do not batch languages together.
Applicability Analysis Details
Before porting, perform thorough analysis:Does the Vulnerability Class Exist?
Examples:- Buffer overflow: Applies to C/C++, may apply to Rust (in unsafe blocks), does NOT apply to Python/Java
- SQL injection: Applies to any language with database access
- XSS: Applies to any language generating HTML output
Does an Equivalent Construct Exist?
Parse the original rule to identify:- Sinks: What dangerous functions/methods does it detect?
- Sources: Where does tainted data originate?
- Pattern type: Is it taint-mode or pattern-matching?
- What are the equivalent dangerous functions?
- What are the common source patterns?
- Are there language-specific idioms to consider?
Example Analysis
Example Variant Creation
Let’s port a Python SQL injection rule to Go:Key Differences from Rule Creator
| Aspect | semgrep-rule-creator | semgrep-rule-variant-creator |
|---|---|---|
| Input | Bug pattern description | Existing rule + target languages |
| Output | Single rule+test | Multiple rule+test directories |
| Workflow | Single creation cycle | Independent cycle per language |
| Phase 1 | Problem analysis | Applicability analysis |
Rationalizations to Reject
When porting Semgrep rules, reject these common shortcuts:| Rationalization | Why It Fails | Correct Approach |
|---|---|---|
| ”Pattern structure is identical” | Different ASTs across languages | Always dump AST for target language |
| ”Same vulnerability, same detection” | Data flow differs between languages | Analyze target language idioms |
| ”Rule doesn’t need tests since original worked” | Language edge cases differ | Write NEW test cases for target |
| ”Skip applicability - it obviously applies” | Some patterns are language-specific | Complete applicability analysis first |
| ”I’ll create all variants then test” | Errors compound, hard to debug | Complete full cycle per language |
| ”Library equivalent is close enough” | Surface similarity hides differences | Verify API semantics match |
| ”Just translate the syntax 1:1” | Languages have different idioms | Research target language patterns |
Strictness Principles
Commands
| Task | Command |
|---|---|
| Run tests | semgrep --test --config rule.yaml test-file |
| Validate YAML | semgrep --validate --config rule.yaml |
| Dump AST | semgrep --dump-ast -l <lang> <file> |
| Debug taint flow | semgrep --dataflow-traces -f rule.yaml file |
Foundational Knowledge
Thesemgrep-rule-creator plugin is the authoritative reference for Semgrep rule creation fundamentals. Consult it for guidance on:
- When to use taint mode vs pattern matching
- Test-first methodology
- Anti-patterns to avoid
- Iterating until tests pass
- Rule optimization
Related Plugins
- semgrep-rule-creator - Create new Semgrep rules from scratch
- static-analysis - Run existing Semgrep rules against code
- variant-analysis - Find similar vulnerabilities across codebases
Additional Resources
- Semgrep Pattern Examples - Per-language pattern references
- Semgrep Testing Rules - Testing annotations
- Trail of Bits Testing Handbook - Advanced patterns
Author
Maciej Domanski ([email protected])