Overview
Hermes bytecode (.hbc files) is a compact binary format produced by the Hermes JavaScript engine when compiling JavaScript code for React Native applications. Hedis supports 27 bytecode versions (v61-v96), covering Hermes releases from v0.1.0 to current.Magic Number
Every valid Hermes bytecode file starts with an 8-byte magic number:The magic number must match exactly. If the first 8 bytes don’t match
HeaderMagic, the file is not a valid Hermes bytecode file.File Header Structure
The HBC header contains metadata about the file’s contents and guides subsequent parsing:Version-Specific Fields
The header layout varies by bytecode version:- BigIntCount and BigIntStorageSize — Only present in bytecode version ≥ 87 (Hermes v0.12.0+)
- FunctionSourceCount — Only present in bytecode version ≥ 84 (Hermes v0.8.1+)
- SegmentID — Called
CjsModuleOffsetbefore bytecode version 78
After reading the header, Hedis aligns to a 32-byte boundary before reading the first data section.
SHA1 Integrity Check
For bytecode version ≥ 75, Hermes appends a 20-byte SHA1 hash of the file body as a footer. Hedis verifies this hash during header parsing to ensure file integrity:Bytecode Versions
Hedis maintains opcode definitions for each supported bytecode version inpkg/hbc/types/opcodes/bcvXX/. The version determines:
- Opcode mappings — Instruction set varies between versions
- Header layout — Conditional fields as described above
- String encoding — String table format changed in v56 and v71
- CJS module format — Changed in v77
- Debug info layout — Changed in v91
Version Detection
When parsing a file, Hedis reads the version from the header and selects the appropriate parser:File Sections
After the header, the HBC file contains these sections in order:- Function Headers — Small (16-byte) or Large (32-byte) headers for each function
- String Kinds — Run-length encoded classification (String vs Identifier)
- Identifier Hashes — Hash values for string table lookups
- Small String Table — Bit-packed entries with offset/length/encoding
- Overflow String Table — Full offset/length pairs for long strings
- String Storage — Raw string bytes (UTF-16 or Latin-1)
- Arrays — Serialized array literals
- BigInts — Variable-length signed integers (v87+)
- RegExp — Regular expression patterns and storage
- CJS Modules — CommonJS module references
- Function Sources — Function ID to source string mapping (v84+)
- Debug Info — Source maps, scope descriptors, filenames
Function Headers
Functions use a compact 16-byte bit-packed header format:Overflowed is true, the offset fields point to a 32-byte LargeFunctionHeader with full uint32 fields.
String Encoding
Strings are stored with two-level indirection:-
Small String Table Entry (4 bytes, bit-packed):
- 1 bit: IsUTF16 flag
- 23 bits: Offset into string storage (or overflow table index if Length = 0xFF)
- 8 bits: Length in characters
-
String Storage: Raw bytes in either:
- UTF-16 little-endian with surrogate pair handling for code points above U+FFFF
- Latin-1 (one byte per character)
The string table bit layout changed in bytecode version 56. Earlier versions included a 1-bit IsIdentifier flag, reducing the offset field to 22 bits.
Related Sections
- Fingerprinting — How Hedis extracts hashes from bytecode
- Fuzzy Matching — Similarity detection beyond exact matches
- Database Schema — MongoDB collections storing bytecode fingerprints