Overview
Bytecode preprocessing transforms the program’s RISC-V instructions into a structured format optimized for zero-knowledge proving. This includes PC (program counter) mapping for virtual instruction sequences, padding for power-of-two polynomial encoding, and commitment to the bytecode as a public input.Role in the Proving System
Bytecode preprocessing is a one-time setup operation per program:- Stage: Preprocessing (before proving)
- Input: RISC-V ELF binary decoded into
Vec<Instruction> - Processing: PC mapping, padding, commitment
- Output:
BytecodePreprocessingstruct used during proving/verification - Verification: Bytecode commitment is public input (verifier checks match)
Bytecode Structure
Instruction Format
address: Physical address in ELF (must be aligned toALIGNMENT_FACTOR_BYTECODE)opcode: Determines which subtables/constraints applyvirtual_sequence_remaining: Counts down for multi-cycle virtual instructions
Physical vs. Virtual Instructions
Physical instructions:- Standard RISC-V opcodes (one instruction = one cycle)
- Address uniquely identifies instruction
- Multi-cycle operations (e.g., SHA-256 compression function)
- Multiple cycles share same physical address
- Distinguished by
virtual_sequence_remainingcounter
PC Mapping
The program counter (PC) must map physical instruction addresses to sequential virtual PCs:Why PC Mapping?
Problem: Physical addresses are non-contiguous (gaps due to alignment, data sections). Solution: Map to dense virtual PC space [0, num_instructions).BytecodePCMapper
No-Op Padding
Bytecode is prepended with a single no-op instruction at PC=0:Bytecode Padding
For efficient polynomial encoding, bytecode is padded to a power of two:- Enables efficient multilinear polynomial representation
- Simplifies sumcheck round structure (each bit of PC becomes one round)
- Allows binary tree-like evaluation strategies
Bytecode Commitment
The preprocessed bytecode is committed via polynomial commitment:Commitment Process
-
Convert to polynomial:
-
Commit: Use Dory polynomial commitment scheme
-
Public input:
bytecode_commitmentis part of verification key
Encoding Format
Each instruction encoded as field element(s):- Opcode (enumeration)
- Register indices (rs1, rs2, rd)
- Immediate value
- Flags (virtual sequence, etc.)
Bytecode Checking (Stage 3)
During proving, Jolt verifies execution trace matches committed bytecode:Stage 3: Bytecode RAF Checking
Purpose: Prove that every executed instruction matches the committed bytecode. Sumcheck claim:Bytecode vs. RAM Memory
Bytecode is stored in RAM but treated specially:RAM Region
Checking Strategy
RAM checking: Verifies bytecode memory is consistent (reads match initial writes) Bytecode checking: Verifies executed instructions match bytecode polynomial Both are necessary:- RAM checking: Ensures bytecode wasn’t modified during execution
- Bytecode checking: Ensures traced instructions match committed program
RAMPreprocessing
Performance Characteristics
Preprocessing Time
PC mapping:- O(num_instructions) time
- ~1-10ms for typical programs
- O(code_size · log(code_size)) field operations
- ~10-100ms for typical programs (one-time cost)
Proving Time
Bytecode RAF checking (Stage 3):- O(trace_length · log(code_size)) field operations
- ~5-10% of total proving time
Memory Usage
Preprocessing storage:BytecodePreprocessing: ~code_size bytesRAMPreprocessing: ~code_size / 8 · 8 bytes (packed words)- Typical: less than 1MB even for large programs
Verifier Time
- O(log(code_size)) for bytecode commitment check
- O(log(trace_length)) for RAF sumcheck verification
- ~0.1-1ms total
Compressed Instructions
Jolt supports RISC-V compressed (16-bit) instructions:Decompression
ALIGNMENT_FACTOR_BYTECODE typically requires 4-byte alignment after decompression.
Integration with Prover Pipeline
Bytecode preprocessing integrates with the full pipeline:- Decode ELF → Extract instruction bytes
- Preprocess → PC mapping, padding, commitment
- Trace → Emulator uses PC mapper to resolve virtual PCs
- Witness gen → Bytecode polynomial constructed from preprocessed data
- Stage 3 → Bytecode RAF checking proves trace matches bytecode
- Verification → Verifier checks bytecode commitment matches public input
Key Files
jolt-core/src/zkvm/bytecode/mod.rs:BytecodePreprocessingandBytecodePCMapperjolt-core/src/zkvm/bytecode/read_raf_checking.rs: Bytecode RAF sumcheckjolt-core/src/zkvm/ram/mod.rs:RAMPreprocessing(bytecode memory representation)tracer/src/lib.rs: ELF decoding and instruction extraction (decodefunction)
Next Steps
- Emulation: How tracer uses PC mapping during execution
- Instruction Execution: Verifying instruction lookups
- RAM Checking: Memory consistency verification