Overview
Hedis uses MongoDB to store fingerprints for vulnerable packages and React Native baselines. The database name is configurable viaMONGO_DB_NAME (default: hedis).
Core Collections
Hedis maintains four primary collections:packages— npm package metadata and vulnerability informationhashes/hashes_ghsa— Function fingerprints for each package per React Native versionbaselines_v3— Baseline fingerprints for empty React Native apps
Collection: packages
Stores npm package information and tracks processing status.
Schema
Purpose
- Package tracking — Maintains list of packages to fingerprint from GitHub Security Advisories
- Error handling — Tracks which React Native versions failed during processing
- Resume capability — Pipeline can skip packages with known errors
Example Document
The
Error map uses React Native version strings as keys, allowing fine-grained error tracking across different environments.Collection: hashes / hashes_ghsa
Stores function-level fingerprints for each package compiled under each React Native version.
Schema
Purpose
- Exact matching — SHA256 hashes indexed for fast lookups
- Fuzzy matching — Raw IR strings stored for Levenshtein distance comparison
- Version specificity — Each package × RN version combination generates unique fingerprints
Example Document
The
omitempty BSON tag on Content IR1/IR2 fields means they will be absent from the document if the function has no string literals (IR1) or identifiers/objects (IR2).Indexes
For performance, these fields are indexed:Collection: baselines_v3
Stores fingerprints for empty React Native apps (no third-party packages) to filter out framework functions.
Schema
Purpose
- False positive reduction — Excludes React Native framework code from matches
- Version-specific baselines — Each RN version has a unique baseline
- Bytecode tracking — Links baseline to specific Hermes compiler version
Example Document
Generation
Baselines are generated using:- Creates an empty React Native app for each supported version (0.69-0.79)
- Bundles with Metro and compiles with Hermes
- Extracts all function fingerprints
- Stores in
baselines_v3collection
Data Flow
Query Patterns
Exact Hash Lookup
Fuzzy Match Pre-Filter (Length-Based)
Package-Specific Lookup
Baseline Filtering
Database Maintenance
Themaintain-database command manages database operations:
Storage Considerations
Typical collection sizes:packages: ~10,000 documents (one per vulnerable package version)hashes_ghsa: ~5,000,000 documents (functions × packages × RN versions)baselines_v3: ~11 documents (one per RN version, each with ~500-2000 function hashes)
- Raw IR strings: ~200-500 bytes per function
- SHA256 hashes: 64 bytes per hash type
- Total: ~500 MB - 2 GB for typical fingerprint database
Consider using MongoDB compression and proper indexing to optimize storage and query performance. The
hashes_ghsa collection benefits significantly from compound indexes on (react_native_version, hash.structural_hash).Related Sections
- Fingerprinting — How hashes are generated
- Fuzzy Matching — How raw IR strings are compared
- Hermes Bytecode — Source of the fingerprint data