Overview
Hedis generates three types of hashes for each function in a Hermes bytecode file. These hashes enable both exact matching (via SHA256) and approximate similarity matching (via MinHash/LSH) to detect vulnerable packages in React Native apps.The Three Hash Types
1. Structural Hash
Captures the control-flow shape of the function by hashing its opcode sequence. What it includes:- Parameter count prefix:
pc=N| - Opcode mnemonic sequence:
LoadParam|GetById|Ret| - Opcode bigrams for fuzzy matching:
LoadParam→GetById,GetById→Ret
- Operand values (register numbers, constants)
- String literals and identifiers
- Object references
Structural hashes are resilient to minification and variable renaming, making them ideal for detecting obfuscated or bundled code.
2. Content IR1 Hash (Non-Identifier Strings)
Captures string literal content that is not an identifier. What it includes:- String literals that are NOT identifiers (e.g.
"Error: Invalid input","https://api.example.com") - All values lowercased and sorted alphabetically
- Trigram shingles (3-character substrings) for fuzzy matching
- Identifier names (variable/property names)
- Object references
- Opcode structure
3. Content IR2 Hash (Identifiers and Objects)
Captures identifier names and object references used by the function. What it includes:- Identifier strings (variable names, property names)
- Object references (object literal keys/values)
- All values lowercased and sorted alphabetically
- Trigram shingles for fuzzy matching
- Non-identifier string literals
- Opcode structure
How Hashes Are Computed
From the source code atpkg/analyzer/compute.go:27:
Storage Format
Hashes are stored in MongoDB with both exact-match and fuzzy-match representations:Complementary Matching Strategy
The three hash types work together to maximize detection:| Hash Type | Detects | Resilient To | Vulnerable To |
|---|---|---|---|
| Structural | Control flow patterns | Renaming, string changes | Code restructuring, compiler optimization |
| Content IR1 | Unique string literals | Code reordering | String obfuscation, encryption |
| Content IR2 | API usage patterns | Code reordering | Identifier renaming, obfuscation |
Example Scenario
A vulnerable function in[email protected]:
- Structural: Captures
LoadParam|JStrictEqual|JmpTrue|LoadConstString|Throw|GetById|Call|Retpattern - Content IR1: Captures
"document_picker_canceled"error message - Content IR2: Captures identifiers:
opts,type,NativeModules,RNDocumentPicker,pick
- Even if minified → Structural hash still matches
- Even if identifiers renamed → Content IR1 hash matches on error string
- If error string is changed → Structural or Content IR2 may still match
Hedis requires at least a 10-character IR string before computing hashes. Functions with fewer than 10 characters in a given IR will have an empty hash for that type.
Baseline Filtering
To reduce false positives, Hedis maintains baseline fingerprints for each React Native version (empty app with no packages). Functions matching baseline hashes are excluded from results, as they represent framework code rather than third-party packages. See Database Schema for details on thebaselines_v3 collection.
Related Sections
- Hermes Bytecode — Understanding the bytecode format
- Fuzzy Matching — How MinHash and LSH enable approximate matching
- Database Schema — MongoDB collections storing fingerprints