Skip to main content
A behavior-driven plugin for authoring high-quality YARA-X detection rules, teaching you to think and act like an expert YARA author.

Overview

The YARA Authoring plugin provides expert-level guidance for writing YARA-X detection rules that catch malware without drowning in false positives. It focuses on decision trees, expert heuristics, and production-tested patterns rather than dumping YARA syntax documentation.
YARA-X Focus: This plugin targets YARA-X, the Rust-based successor to legacy YARA. YARA-X powers VirusTotal’s Livehunt/Retrohunt production systems and is 5-10x faster for regex-heavy rules. Legacy YARA (C implementation) is in maintenance mode.
Key capabilities:
  • Decision trees for common judgment calls
  • Expert heuristics from experienced YARA authors
  • Naming conventions (CATEGORY_PLATFORM_FAMILY_DATE format)
  • Performance optimization (atom quality, short-circuit conditions)
  • Testing workflow with goodware corpus validation
  • YARA-X migration guide for converting legacy rules
  • Chrome extension analysis with crx module
  • Android DEX analysis with dex module

Installation

YARA-X CLI

# macOS
brew install yara-x

# Or from source
cargo install yara-x

# Verify installation
yr --version

Python Package (for scripts)

pip install yara-x
# or with uv
uv pip install yara-x

Plugin

/plugin install trailofbits/skills/plugins/yara-authoring

When to Use

Use this plugin when:
  • Writing new YARA-X rules for malware detection
  • Reviewing existing rules for quality or performance issues
  • Optimizing slow-running rulesets
  • Converting IOCs or threat intel into detection signatures
  • Debugging false positive issues
  • Preparing rules for production deployment
  • Migrating legacy YARA rules to YARA-X
  • Analyzing Chrome extensions (crx module)
  • Analyzing Android apps (dex module)

When NOT to Use

Do NOT use this plugin for:
  • Static analysis requiring disassembly → use Ghidra/IDA skills
  • Dynamic malware analysis → use sandbox analysis skills
  • Network-based detection → use Suricata/Snort skills
  • Memory forensics with Volatility → use memory forensics skills
  • Simple hash-based detection → just use hash lists

Core Principles

Good Atoms

Strings must generate good atoms. YARA extracts 4-byte subsequences for fast matching. Strings with repeated bytes or under 4 bytes force slow verification.

Specific Families

Target specific families, not categories. “Detects ransomware” catches everything and nothing. “Detects LockBit 3.0 config extraction” is precise.

Test Against Goodware

A rule that fires on Windows system files is useless. Validate against VirusTotal’s goodware corpus or your own clean file set.

Short-Circuit First

Put cheap checks first: filesize < 10MB and uint16(0) == 0x5A4D before expensive string searches or module calls.

Essential Toolkit

An expert uses 5 tools. Everything else is noise.
ToolPurposeUsage
yarGenExtract candidate stringsyarGen.py -m samples/ --excludegood → validate with yr check
FLOSSExtract obfuscated/stack stringsfloss sample.exe (when yarGen fails)
yr CLIValidate, scan, inspectyr check, yr scan -s, yr dump -m pe
signature-baseStudy quality examplesLearn from 17,000+ production rules
YARA-CIGoodware corpus testingTest before deployment

Rule Structure

Every YARA-X rule follows this format:
rule MAL_Win_Emotet_Loader_Jan25
{
    meta:
        description = "Detects Emotet loader via unique mutex and C2 path"
        author = "Your Name <[email protected]>"
        reference = "https://example.com/analysis"
        date = "2025-01-29"
        score = 85

    strings:
        // Mutex names are gold - unique to this malware family
        $mutex = "Global\\M4884" ascii wide
        
        // C2 path pattern - silver tier indicator
        $c2_path = /\/api\/[a-z]{8}\/bot\.php/ ascii
        
        // Configuration marker - bronze tier
        $cfg = { 43 4F 4E 46 49 47 [0-10] 45 4E 44 }

    condition:
        filesize < 2MB and
        uint16(0) == 0x5A4D and  // PE magic bytes
        2 of them
}

Naming Convention

{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
Common prefixes:
  • MAL_ - Malware
  • HKTL_ - Hacking tool
  • WEBSHELL_ - Web shell
  • EXPL_ - Exploit
  • SUSP_ - Suspicious (not definitively malicious)
  • GEN_ - Generic detection
Platforms: Win_, Lnx_, Mac_, Android_, CRX_ Example: MAL_Win_Emotet_Loader_Jan25

Required Metadata

Every rule needs these fields:
meta:
    description = "Detects Example malware via unique mutex and C2 path"
    author = "Your Name <[email protected]>"
    reference = "https://example.com/analysis"
    date = "2025-01-29"

Platform-Specific Patterns

YARA works on any file type. Adapt patterns to your target:

Windows PE

condition:
    uint16(0) == 0x5A4D and  // PE magic bytes
    filesize < 10MB and
    // Good: Mutex names, PDB paths, C2 paths
    // Bad: API names, Windows paths
    $mutex and $pdb

macOS Mach-O

condition:
    // Mach-O magic bytes
    (uint32(0) == 0xFEEDFACE or   // 32-bit
     uint32(0) == 0xFEEDFACF or   // 64-bit
     uint32(0) == 0xCAFEBABE) and // Universal binary
    filesize < 10MB and
    // Good: Keylogger strings, persistence paths, credential theft
    any of ($behav*)
Good indicators for macOS:
  • Keylogger: CGEventTapCreate, kCGEventKeyDown
  • SSH tunneling: ssh -D, tunnel, socks
  • Persistence: ~/Library/LaunchAgents, /Library/LaunchDaemons
  • Credentials: security find-generic-password, keychain

npm Supply Chain Attacks

condition:
    filesize < 5MB and
    // ERC-20 function selectors for wallet draining
    2 of ($erc20_*) and
    // Confirm npm/JS context
    any of ($npm_context*, $js_context)
Good strings for JavaScript:
  • Ethereum selectors: { 70 a0 82 31 } (transfer)
  • Zero-width steganography: { E2 80 8B E2 80 8C }
  • Obfuscator signatures: _0x, var _0x
  • C2 patterns: domain names, webhook URLs
Bad strings:
  • require, fetch, axios - too common
  • Buffer, crypto - legitimate uses everywhere
  • process.env alone - need specific env var names

Chrome Extensions (crx module)

import "crx"

rule SUSP_CRX_HighRiskPerms {
    condition:
        crx.is_crx and
        for any perm in crx.permissions : (perm == "debugger")
}
Red flags: nativeMessaging + downloads, debugger permission, content scripts on <all_urls>

Android DEX

import "dex"

rule SUSP_DEX_DynamicLoading {
    condition:
        dex.is_dex and
        dex.contains_class("Ldalvik/system/DexClassLoader;")
}
Red flags: Single-letter class names (obfuscation), DexClassLoader reflection, encrypted assets

Decision Trees

Is This String Good Enough?

Is this string good enough?
├─ Less than 4 bytes?
│  └─ NO — find longer string
├─ Contains repeated bytes (0000, 9090)?
│  └─ NO — add surrounding context
├─ Is an API name (VirtualAlloc, CreateRemoteThread)?
│  └─ NO — use hex pattern of call site instead
├─ Appears in Windows system files?
│  └─ NO — too generic, find something unique
├─ Unique to this malware family?
│  └─ YES — use it

When to Use “all of” vs “any of”

Should I require all strings or allow any?
├─ Strings are individually unique to malware?
│  └─ any of them (each alone is suspicious)
├─ Strings are common but combination is suspicious?
│  └─ all of them (require the full pattern)
├─ Seeing many false positives?
│  └─ Tighten: switch any → all, add more required strings

When to Abandon a Rule Approach

Stop and pivot when:
  • yarGen returns only API names and paths → Pivot to PE structure, entropy, or imphash
  • Can’t find 3 unique strings → Probably packed. Target the unpacked version or detect the packer
  • Rule matches goodware files
    • 1-2 matches = investigate and tighten
    • 3-5 matches = find different indicators
    • 6+ matches = start over
  • Performance is terrible → Split into multiple focused rules or add strict pre-filters
  • Description is hard to write → Rule is too vague. If you can’t explain what it catches, it catches too much

Real-World Example

Here’s a production-quality rule detecting npm supply chain attacks:
rule MAL_NPM_ChalkDebug_Sept25
{
    meta:
        description = "Detects malicious wallet-drainer code from chalk/debug npm supply-chain compromise"
        author = "Stairwell Threat Research (adapted)"
        reference = "https://stairwell.com/resources/npm-supply-chain-attacks-yara/"
        date = "2025-09-11"
        score = 95

    strings:
        // Unique function names from the malicious payload
        $s1 = "runmask" ascii
        $s2 = "checkethereumw" ascii

        // Ethereum function selector for approve(address,uint256)
        // This ERC-20 method grants token spending permission
        $function_selector = "0x095ea7b3" ascii

    condition:
        filesize < 5MB and
        all of them
}
Why this works:
  • Function names (runmask, checkethereumw) are unique to the attack
  • Ethereum function selector adds context
  • all of them prevents false positives
  • Small filesize pre-filter improves performance

Expert Heuristics

Gold tier: Mutex names, PDB paths, stack strings (almost always unique)Silver tier: C2 paths, configuration markers, error messagesBronze tier: API sequences, unusual importsGarbage tier: Single API names, common paths, format specifiersIf you need >6 strings, you’re over-fitting.
Never use nocase or wide speculatively — only when you have confirmed evidence the case/encoding varies in samples.
  • nocase doubles atom generation
  • wide doubles string matching
  • Both have real performance costs
“If you don’t have a clear reason for using those modifiers, don’t do it” — Kaspersky Applied YARA
Regex without a 4+ byte literal substring evaluates at every file offset — catastrophic performance.
// BAD: Evaluates everywhere
/http:\/\/.+/

// GOOD: Anchored to distinctive literal
/mshta\.exe http:\/\/.+/
If you can’t anchor, consider hex pattern with wildcards instead.
Always bound loops with filesize:
filesize < 100KB and for all i in (1..#a) : ...
Unbounded #a can be thousands in large files — exponential slowdown.

Rationalizations to Reject

When you catch yourself thinking these, stop and reconsider:
RationalizationExpert Response
”This generic string is unique enough”Test against goodware first. Your intuition is wrong.
”yarGen gave me these strings”yarGen suggests, you validate. Check each one manually.
”It works on my 10 samples”10 samples ≠ production. Use VirusTotal goodware corpus.
”One rule to catch all variants”Causes FP floods. Target specific families.
”I’ll make it more specific if we get FPs”Write tight rules upfront. FPs burn trust.
”Performance doesn’t matter”One slow rule slows entire ruleset. Optimize atoms.
”any of them is fine for these common strings”Common strings + any = FP flood. Use any of only for individually unique strings.
”This regex is specific enough”/fetch.*token/ matches all auth code. Add exfil destination requirement.
”I’ll use .* for flexibility”Unbounded regex = performance disaster. Use .{0,30}.

Performance Optimization

Quick Wins

  1. Put filesize first — instant check
  2. Avoid nocase — doubles atom generation
  3. Bound regex — use {1,100} not .*
  4. Prefer hex over regex — faster matching

Red Flags

  • Strings less than 4 bytes
  • Unbounded regex (.*)
  • Modules without file-type filter
  • any of with common strings

Condition Ordering

Order conditions for short-circuit evaluation:
condition:
    filesize < 10MB and           // 1. Instant
    uint16(0) == 0x5A4D and       // 2. Nearly instant
    2 of ($string*) and           // 3. String matches (cheap)
    pe.imphash() == "..." and     // 4. Module checks (expensive)
    for all section in pe.sections : (...)  // 5. Loops (most expensive)

Migrating from Legacy YARA

YARA-X has 99% rule compatibility, but enforces stricter validation. Quick migration:
yr check --relaxed-re-syntax rules/  # Identify issues
# Fix each issue, then:
yr check rules/  # Verify without relaxed mode
Common fixes:
IssueLegacyYARA-X Fix
Literal { in regex/{//\{/
Invalid escapes\R silently literal\\R or R
Base64 stringsAny length3+ chars required
Negative indexing@a[-1]@a[#a - 1]
Duplicate modifiersAllowedRemove duplicates
Use --relaxed-re-syntax only as a diagnostic tool. Fix issues rather than relying on relaxed mode permanently.

Included Scripts

The plugin includes two Python scripts with PEP 723 inline metadata (dependencies auto-resolved by uv run):

yara_lint.py

Validates YARA-X rules for style, metadata, compatibility issues, and anti-patterns:
uv run yara_lint.py rule.yar
uv run yara_lint.py --json rules/
uv run yara_lint.py --strict rule.yar

atom_analyzer.py

Evaluates string quality for efficient atom extraction:
uv run atom_analyzer.py rule.yar
uv run atom_analyzer.py --verbose rule.yar

Workflow

1

Gather samples

Multiple samples required. Single-sample rules are brittle.
2

Extract candidates

Run yarGen -m samples/ --excludegood
3

Validate quality

Use decision trees. yarGen needs 80% filtering.
4

Write initial rule

Follow template with proper metadata.
5

Lint and test

Run yr check, yr fmt, linter script.
6

Goodware validation

Test against VirusTotal corpus or local clean files.
7

Deploy

Add to repo with full metadata, monitor for FPs.

Quality Checklist

Before deploying any rule:
  • Name follows {CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE} format
  • Description starts with “Detects” and explains what/how
  • All required metadata present (author, reference, date)
  • Strings are unique (not API names, common paths, or format strings)
  • All strings have 4+ bytes with good atom potential
  • Base64 modifier only on strings with 3+ characters
  • Regex patterns have escaped { and valid escape sequences
  • Condition starts with cheap checks (filesize, magic bytes)
  • Rule matches all target samples
  • Rule produces zero matches on goodware corpus
  • yr check passes with no errors
  • yr fmt --check passes (consistent formatting)
  • Linter passes with no errors
  • Peer review completed

Additional Resources

Quality YARA Rule Repositories

RepositoryFocusMaintainer
Neo23x0/signature-base17,000+ production rules, multi-platformFlorian Roth
Elastic/protections-artifacts1,000+ endpoint-tested rulesElastic Security
reversinglabs/reversinglabs-yara-rulesThreat research rulesReversingLabs
imp0rtp3/js-yara-rulesJavaScript/browser malwareimp0rtp3

Guides

GuidePurpose
YARA Style GuideNaming conventions, metadata
YARA Performance GuidelinesAtom optimization, regex bounds
Kaspersky Applied YARA TrainingExpert techniques

Official Documentation

Author

Trail of Bits ([email protected])

Build docs developers (and LLMs) love