Overview
The Semgrep Rule Creator plugin guides you through creating production-quality Semgrep rules with proper testing and validation. It enforces a strict test-first methodology to ensure rules are accurate, maintainable, and free from false positives. Key capabilities:- Test-driven rule development (write tests first, then iterate)
- AST analysis to craft precise patterns
- Support for both taint mode (data flow) and pattern matching
- Comprehensive reference documentation from Semgrep docs
- Common vulnerability patterns by language
Installation
Prerequisites
- Semgrep installed (
pip install semgreporbrew install semgrep)
When to Use
Use this plugin when you need to:- Create custom Semgrep rules for detecting specific bug patterns
- Write rules for security vulnerability detection
- Build taint mode rules for data flow analysis
- Develop pattern matching rules for code quality checks
- Enforce coding standards with custom detections
When NOT to Use
Do NOT use this plugin for:- Running existing Semgrep rulesets (use
semgrep scaninstead) - General static analysis without custom rules (use the
static-analysisplugin)
Core Workflow
The plugin enforces a strict 7-step workflow:Analyze the Problem
Understand the bug pattern, target language, and determine whether to use taint mode or pattern matching.
Write Tests First
Create test file with vulnerable cases (
ruleid:) and safe cases (ok:) before writing any rule code.Taint Mode vs Pattern Matching
When to Use Taint Mode (Prioritize)
Use taint mode for data flow issues where untrusted input reaches dangerous sinks:eval($X) matches both eval(user_input) (vulnerable) and eval("safe_literal") (safe). Taint mode tracks data flow, so it only alerts when untrusted data actually reaches the sink.
When to Use Pattern Matching
Use pattern matching for simple syntactic patterns without data flow requirements:Output Structure
Each rule produces exactly 2 files in a directory named after the rule ID:Example Rule
Here’s a complete example for detecting SQL injection in Python:Key Commands
| Command | Purpose |
|---|---|
semgrep --dump-ast -l <lang> <file> | View AST structure |
semgrep --validate --config <rule>.yaml | Validate YAML syntax |
semgrep --test --config <rule>.yaml <test-file> | Run tests |
semgrep --dataflow-traces -f <rule>.yaml <file> | Debug taint flow |
Strictness Principles
The plugin enforces strict quality standards:Anti-Patterns to Avoid
Too Broad
Matches everything, useless for detection:Missing Safe Cases
Leads to undetected false positives:Overly Specific
Misses variations:Rationalizations to Reject
When writing Semgrep rules, reject these common shortcuts:| Rationalization | Why It Fails |
|---|---|
| ”The pattern looks complete” | Still run semgrep --test to verify. Untested rules have hidden false positives/negatives. |
| ”It matches the vulnerable case” | Matching vulnerabilities is half the job. Verify safe cases don’t match. |
| ”Taint mode is overkill for this” | If data flows from user input to a dangerous sink, taint mode gives better precision. |
| ”One test is enough” | Include edge cases: different coding styles, sanitized inputs, safe alternatives. |
| ”I’ll optimize the patterns first” | Write correct patterns first, optimize after all tests pass. |
| ”The AST dump is too complex” | The AST reveals exactly how Semgrep sees code. Skipping it leads to missed variations. |
Required Documentation
Before writing any rule, the plugin requires reading these Semgrep resources using WebFetch:Related Plugins
- semgrep-rule-variant-creator - Port existing Semgrep rules to new target languages
- static-analysis - General static analysis toolkit with Semgrep, CodeQL, and SARIF parsing
- variant-analysis - Find similar vulnerabilities across codebases
Additional Resources
Author
Maciej Domanski