Skip to main content
Hedis generates three distinct hash types for each function in a Hermes bytecode bundle. These hashes work together to enable both exact matching and fuzzy similarity detection of vulnerable package code.

Overview

Each function is fingerprinted using three complementary approaches:
Hash TypeFocusAlgorithmUse Case
Structural HashInstruction flowSHA256 + MinHashControl-flow matching
Content IR1 HashString literalsSHA256 + MinHashLiteral value matching
Content IR2 HashIdentifiers & objectsSHA256 + MinHashAPI structure matching
All three hashes are generated from intermediate representations (IRs) created by the ToIR() method. See IR Normalization for details on how IRs are generated.

Hash Type 1: Structural Hash

Captures the control-flow shape of a function independent of concrete values.

Input

Structural IR: pipe-delimited sequence of instruction names with parameter count prefix.
pc=2|LoadParam|LoadParam|Add|Ret|

Exact Match Hash

Algorithm
SHA256
Computes SHA256 of the structural IR string
Output
hex string
64-character hexadecimal hash
Database Field
string
structural_hash
Example:
structuralHash := sha256.Sum256([]byte(structuralIR))
structuralHashHex := hex.EncodeToString(structuralHash[:])

Fuzzy Match Hash

Algorithm
MinHash
Generates 128 hash permutations from instruction bigrams
Tokenization
bigrams
Creates tokens like LoadParam→Add
Output
[]uint64
Array of 128 hash signatures
Database Field
array
structural_minhash
Implementation: Source: pkg/analyzer/minhasher.go
tokens := functionObject.TokenizeStructuralIR()
minhash := GenerateMinHash(tokens, 128)
Similarity Calculation: Jaccard similarity via MinHash signature overlap:
similarity := float64(matchingHashes) / float64(totalHashes)

Hash Type 2: Content IR1 Hash

Captures string literals (non-identifier strings) used in the function.

Input

Content IR1: pipe-delimited, sorted list of lowercased string literals.
error: connection failed|warning|timeout occurred

Exact Match Hash

Algorithm
SHA256
Computes SHA256 of the content IR1 string
Output
hex string
64-character hexadecimal hash (empty string if no literals)
Database Field
string
content_ir1_hash

Fuzzy Match Hash

Algorithm
MinHash
Generates 128 hash permutations from string tokens and trigrams
Tokenization
trigrams
Each string ≥3 chars generates shingled tokens: error["err", "rro", "ror"]
Output
[]uint64
Array of 128 hash signatures
Database Field
array
content_ir1_minhash
Why Trigrams? Trigram shingling enables partial string matching, crucial for detecting modified or obfuscated error messages and string constants.

Hash Type 3: Content IR2 Hash

Captures identifiers and object structures referenced in the function.

Input

Content IR2: pipe-delimited, sorted list of identifiers and object references.
apiendpoint|fetch|userconfig|{apikey: string, timeout: number}

Exact Match Hash

Algorithm
SHA256
Computes SHA256 of the content IR2 string
Output
hex string
64-character hexadecimal hash (empty string if no identifiers)
Database Field
string
content_ir2_hash

Fuzzy Match Hash

Algorithm
MinHash
Generates 128 hash permutations from identifier tokens and trigrams
Tokenization
trigrams
Each identifier ≥3 chars generates shingled tokens
Output
[]uint64
Array of 128 hash signatures
Database Field
array
content_ir2_minhash
Use Case: Matches functions based on API usage patterns, even if string literals differ (e.g., internationalization).

Hash Generation Pipeline

1. Disassemble Bytecode

Parse HBC file and create FunctionObject representations:
functionObjects, err := CreateFunctionObjects(hbcReader)
Source: pkg/hbc/normalizer.go:20

2. Generate IRs

Extract structural and content IRs:
structuralIR, contentIR1, contentIR2 := functionObject.ToIR()
Source: pkg/hbc/types/functionobject.go:144

3. Compute Hashes

Generate both exact and fuzzy hashes:
// Exact hashes (SHA256)
structuralHash := sha256.Sum256([]byte(structuralIR))
contentIR1Hash := sha256.Sum256([]byte(contentIR1))
contentIR2Hash := sha256.Sum256([]byte(contentIR2))

// Fuzzy hashes (MinHash)
structuralTokens := functionObject.TokenizeStructuralIR()
cir1Tokens, cir2Tokens := functionObject.TokenizeContentIRs()

structuralMinHash := GenerateMinHash(structuralTokens, 128)
cir1MinHash := GenerateMinHash(cir1Tokens, 128)
cir2MinHash := GenerateMinHash(cir2Tokens, 128)
Source: pkg/analyzer/compute.go

4. Store in Database

Hashes are stored per package per React Native version:
packageHash := &models.PackageHash{
    PackageUniqueId:      pkg.PackageUniqueId,
    ReactNativeVersion:   rnVersion,
    FunctionHashes:       functionHashes,
}
Source: pkg/database/models/package_hashes_model.go

Database Schema

Each function hash document contains:
FieldTypeDescription
function_namestringFunction identifier
bytecode_sizeintSize in bytes
param_countintNumber of parameters
structural_hashstringSHA256 of structural IR
structural_minhash[]uint64MinHash signatures (128)
content_ir1_hashstringSHA256 of content IR1
content_ir1_minhash[]uint64MinHash signatures (128)
content_ir2_hashstringSHA256 of content IR2
content_ir2_minhash[]uint64MinHash signatures (128)

Matching Strategy

During analysis, functions are matched using a cascading approach:

1. Length Pre-filtering

if targetSize < minSize || targetSize > maxSize {
    continue // Skip if bytecode size differs by >20%
}

2. Exact Match (SHA256)

if targetStructuralHash == dbStructuralHash {
    return MatchResult{Type: "exact", Confidence: 1.0}
}

3. Fuzzy Match (MinHash)

Only if fuzzy matching is enabled (-f flag):
similarity := ComputeJaccardSimilarity(targetMinHash, dbMinHash)
if similarity >= threshold {
    return MatchResult{Type: "fuzzy", Confidence: similarity}
}
Source: pkg/cmd/analyze.go

Performance Characteristics

Exact Matching
O(1)
MongoDB indexed lookup on SHA256 hash fields
Fuzzy Matching
O(n)
Linear scan with length pre-filtering reduces comparison space by ~80%
Storage
~500 bytes/function
6 hash fields (3 SHA256 + 3 MinHash arrays)

MinHash Implementation

Source: pkg/analyzer/minhasher.go

Parameters

Hash Functions
128
Number of permutations for signature generation
Hash Algorithm
FNV-1a
Fast non-cryptographic hash for token processing

Algorithm

func GenerateMinHash(tokens []string, numHashes int) []uint64 {
    signature := make([]uint64, numHashes)
    for i := 0; i < numHashes; i++ {
        signature[i] = math.MaxUint64
    }
    
    for _, token := range tokens {
        for i := 0; i < numHashes; i++ {
            hash := hash(token, seed[i])
            if hash < signature[i] {
                signature[i] = hash
            }
        }
    }
    
    return signature
}

Use Cases by Hash Type

ScenarioPrimary HashFallback Hash
Exact code matchStructural-
Renamed variablesStructuralContent IR1
Internationalized stringsStructuralContent IR2
Obfuscated codeContent IR1/IR2Structural
Refactored codeContent IR2Content IR1

Build docs developers (and LLMs) love