Skip to main content

Architecture Overview

Hedis (Hermes Decompiler/Disassembler) is a Go-based supply chain security tool for analyzing React Native mobile applications. It identifies vulnerable npm packages by fingerprinting Hermes bytecode and matching against a database of known vulnerable package signatures.

System Components

Hedis consists of four main architectural components that form a complete analysis pipeline:

1. HBC Reader (pkg/hbc/)

The bytecode parser that reads and disassembles Hermes .hbc files into intermediate representations. Key responsibilities:
  • Parse HBC file headers and sections
  • Support 27 bytecode versions (v61-v96)
  • Disassemble instructions with version-specific opcode definitions
  • Generate normalized function objects
Learn more: HBC Reader Architecture

2. Pipeline (pkg/pipeline/)

The package processing engine that fingerprints npm packages across multiple React Native versions. Key responsibilities:
  • Parallel processing across 11 RN environments (0.69-0.79)
  • Metro bundling and Hermes compilation orchestration
  • Batch database operations for efficiency
  • Resume capability via progress tracking
Learn more: Pipeline Architecture

3. Analyzer (pkg/analyzer/)

The fuzzy matching engine that identifies vulnerable packages using MinHash and Levenshtein distance. Key responsibilities:
  • Generate multi-level function fingerprints
  • Compute MinHash signatures for approximate matching
  • LSH (Locality-Sensitive Hashing) for candidate pair retrieval
  • Levenshtein distance for final similarity scoring
Learn more: Analyzer Architecture

4. Database Layer (pkg/database/)

MongoDB integration for storing and querying package fingerprints and vulnerability data. Collections:
  • packages — npm package metadata
  • hashes / hashes_ghsa — function fingerprints per package per RN version
  • baselines_v3 — empty RN app fingerprints for filtering framework functions

Data Flow

Fingerprint Generation Pipeline

Analysis Pipeline

Multi-Version Strategy

Hedis supports 11 React Native versions (0.69-0.79) in parallel, each with its own Hermes compiler version:
Different React Native versions use different Hermes bytecode versions (bcv). A package compiled under RN 0.72 (bcv 84) produces different bytecode than the same package under RN 0.76 (bcv 90). To match packages accurately, Hedis must fingerprint each package under all RN versions that applications might use.This creates an m × n matrix where:
  • m = number of vulnerable npm packages (~300-500)
  • n = number of RN versions (11)
  • Total fingerprints ≈ 3,000-5,500 package versions

Fingerprint Strategy

Each function is fingerprinted at three levels:
IR LevelContentUse Case
StructuralOpcode bigrams (e.g. LoadParam→GetById)Control flow matching, resistant to variable renaming
Content IR1Non-identifier string literalsAPI endpoints, error messages, constants
Content IR2Identifiers + object referencesVariable names, imports, object shapes
Exact matching: SHA256 hash of each IR Fuzzy matching: MinHash signature (128 dimensions) + LSH (32 bands × 4 rows)
Different obfuscation techniques affect different IR levels:
  • String encryption breaks Content IR1 but leaves Structural and Content IR2 intact
  • Variable renaming (minification) breaks Content IR2 but leaves Structural and Content IR1
  • Dead code injection reduces similarity across all IRs but doesn’t eliminate matches
By using three complementary IRs, Hedis can still identify packages even when some obfuscation is present.

Parallel Processing Design

The pipeline uses goroutines with semaphore-based concurrency control:
// Process 11 RN versions in parallel (max 4 concurrent)
semaphore := make(chan struct{}, 4)
for _, rnEnv := range rnEnvironments {
    semaphore <- struct{}{} // acquire
    go func(env RNEnvironment) {
        defer func() { <-semaphore }() // release
        ProcessPackagesForRNVersion(env, packages)
    }(rnEnv)
}
Benefits:
  • 4x parallelism for I/O-bound operations (npm install, Metro bundling)
  • Prevents resource exhaustion (each RN environment needs ~2GB RAM)
  • Database batching (100 ops/batch) reduces network round-trips

Next Steps

Explore each component in detail:

HBC Reader

Bytecode parsing and version support

Pipeline

Package processing and fingerprinting

Analyzer

Fuzzy matching algorithms

Database Schema

MongoDB collections and models

Build docs developers (and LLMs) love