Skip to main content

Overview

Bytecode preprocessing transforms the program’s RISC-V instructions into a structured format optimized for zero-knowledge proving. This includes PC (program counter) mapping for virtual instruction sequences, padding for power-of-two polynomial encoding, and commitment to the bytecode as a public input.

Role in the Proving System

Bytecode preprocessing is a one-time setup operation per program:
  1. Stage: Preprocessing (before proving)
  2. Input: RISC-V ELF binary decoded into Vec<Instruction>
  3. Processing: PC mapping, padding, commitment
  4. Output: BytecodePreprocessing struct used during proving/verification
  5. Verification: Bytecode commitment is public input (verifier checks match)

Bytecode Structure

Instruction Format

pub struct Instruction {
    pub address: usize,                       // Physical memory address
    pub opcode: RiscvOpcode,                  // ADD, XOR, LOAD, etc.
    pub rs1: u8, pub rs2: u8, pub rd: u8,    // Register indices
    pub imm: i64,                             // Immediate value
    pub virtual_sequence_remaining: Option<u16>,  // For inlined instructions
}
Key fields:
  • address: Physical address in ELF (must be aligned to ALIGNMENT_FACTOR_BYTECODE)
  • opcode: Determines which subtables/constraints apply
  • virtual_sequence_remaining: Counts down for multi-cycle virtual instructions

Physical vs. Virtual Instructions

Physical instructions:
  • Standard RISC-V opcodes (one instruction = one cycle)
  • Address uniquely identifies instruction
Virtual instructions:
  • Multi-cycle operations (e.g., SHA-256 compression function)
  • Multiple cycles share same physical address
  • Distinguished by virtual_sequence_remaining counter

PC Mapping

The program counter (PC) must map physical instruction addresses to sequential virtual PCs:

Why PC Mapping?

Problem: Physical addresses are non-contiguous (gaps due to alignment, data sections). Solution: Map to dense virtual PC space [0, num_instructions).

BytecodePCMapper

pub struct BytecodePCMapper {
    // Indexed by: (address - RAM_START_ADDRESS) / ALIGNMENT_FACTOR_BYTECODE
    indices: Vec<Option<(base_pc, max_inline_seq)>>,
}
Mapping logic:
pub fn get_pc(&self, address: usize, virtual_sequence_remaining: u16) -> usize {
    let (base_pc, max_inline_seq) = self.indices[get_index(address)]?;
    base_pc + (max_inline_seq - virtual_sequence_remaining) as usize
}
Example:
Physical address: 0x80001000
Virtual sequence: [5, 4, 3, 2, 1, 0] (6 cycles)

PC mapping:
  0x80001000, seq=5 → PC=10
  0x80001000, seq=4 → PC=11
  0x80001000, seq=3 → PC=12
  ...
  0x80001000, seq=0 → PC=15

No-Op Padding

Bytecode is prepended with a single no-op instruction at PC=0:
bytecode.insert(0, Instruction::NoOp);
Purpose: Simplifies constraint checks (avoids special-casing PC=0).

Bytecode Padding

For efficient polynomial encoding, bytecode is padded to a power of two:
let code_size = bytecode.len().next_power_of_two().max(2);
bytecode.resize(code_size, Instruction::NoOp);
Why power of two?
  • Enables efficient multilinear polynomial representation
  • Simplifies sumcheck round structure (each bit of PC becomes one round)
  • Allows binary tree-like evaluation strategies
Memory cost: Minimal (bytecode typically less than 1MB even for large programs).

Bytecode Commitment

The preprocessed bytecode is committed via polynomial commitment:

Commitment Process

  1. Convert to polynomial:
    Bytecode[i] = encoding of instruction at PC=i
    
  2. Commit: Use Dory polynomial commitment scheme
    bytecode_commitment = Commit(Bytecode_polynomial)
    
  3. Public input: bytecode_commitment is part of verification key

Encoding Format

Each instruction encoded as field element(s):
  • Opcode (enumeration)
  • Register indices (rs1, rs2, rd)
  • Immediate value
  • Flags (virtual sequence, etc.)
Actual encoding uses multiple polynomials (one per field) for efficiency.

Bytecode Checking (Stage 3)

During proving, Jolt verifies execution trace matches committed bytecode:

Stage 3: Bytecode RAF Checking

Purpose: Prove that every executed instruction matches the committed bytecode. Sumcheck claim:
Σ_{pc ∈ executed_PCs} eq(r, pc) · (traced_instr[pc] - bytecode[pc]) = 0
If any traced instruction doesn’t match bytecode, the check fails. RAF polynomial:
BytecodeRAF[pc] = 1 if PC is executed, else 0
Combined with read-RAF checking (like RAM/registers):
Σ_{pc} BytecodeRAF[pc] · (traced_instr[pc] - bytecode[pc]) = 0

Bytecode vs. RAM Memory

Bytecode is stored in RAM but treated specially:

RAM Region

┌─────────────────────┐ ← 0
│  Bytecode Words     │ ← remap_address(min_bytecode_address)
├─────────────────────┤
│  ...                │
Bytecode bytes are packed into 64-bit words (little-endian):
for chunk in memory_init.chunk_by(|a, b| a.0 / 8 == b.0 / 8) {
    let mut word = [0u8; 8];
    for (address, byte) in chunk {
        word[(address % 8) as usize] = *byte;
    }
    bytecode_words.push(u64::from_le_bytes(word));
}

Checking Strategy

RAM checking: Verifies bytecode memory is consistent (reads match initial writes) Bytecode checking: Verifies executed instructions match bytecode polynomial Both are necessary:
  • RAM checking: Ensures bytecode wasn’t modified during execution
  • Bytecode checking: Ensures traced instructions match committed program

RAMPreprocessing

pub struct RAMPreprocessing {
    pub min_bytecode_address: u64,    // Lowest bytecode address
    pub bytecode_words: Vec<u64>,     // Packed instruction bytes
}
Generation:
impl RAMPreprocessing {
    pub fn preprocess(memory_init: Vec<(u64, u8)>) -> Self {
        // Find address range
        let min_addr = memory_init.iter().map(|(a, _)| *a).min();
        let max_addr = memory_init.iter().map(|(a, _)| *a).max();
        
        // Pack bytes into words
        let bytecode_words = pack_bytes_to_words(memory_init);
        
        Self { min_bytecode_address: min_addr, bytecode_words }
    }
}

Performance Characteristics

Preprocessing Time

PC mapping:
  • O(num_instructions) time
  • ~1-10ms for typical programs
Bytecode commitment:
  • O(code_size · log(code_size)) field operations
  • ~10-100ms for typical programs (one-time cost)

Proving Time

Bytecode RAF checking (Stage 3):
  • O(trace_length · log(code_size)) field operations
  • ~5-10% of total proving time

Memory Usage

Preprocessing storage:
  • BytecodePreprocessing: ~code_size bytes
  • RAMPreprocessing: ~code_size / 8 · 8 bytes (packed words)
  • Typical: less than 1MB even for large programs

Verifier Time

  • O(log(code_size)) for bytecode commitment check
  • O(log(trace_length)) for RAF sumcheck verification
  • ~0.1-1ms total

Compressed Instructions

Jolt supports RISC-V compressed (16-bit) instructions:

Decompression

if (first_halfword & 0b11) != 0b11 {
    // Compressed instruction (16-bit)
    let inst = uncompress_instruction(first_halfword, xlen);
    instructions.push(Instruction::decode(inst, address, true));
    offset += 2;
} else {
    // Standard instruction (32-bit)
    let word = u32::from_le_bytes([...]);
    instructions.push(Instruction::decode(word, address, false));
    offset += 4;
}
Alignment: Compressed instructions can be unaligned (2-byte boundaries), but Jolt’s ALIGNMENT_FACTOR_BYTECODE typically requires 4-byte alignment after decompression.

Integration with Prover Pipeline

Bytecode preprocessing integrates with the full pipeline:
  1. Decode ELF → Extract instruction bytes
  2. Preprocess → PC mapping, padding, commitment
  3. Trace → Emulator uses PC mapper to resolve virtual PCs
  4. Witness gen → Bytecode polynomial constructed from preprocessed data
  5. Stage 3 → Bytecode RAF checking proves trace matches bytecode
  6. Verification → Verifier checks bytecode commitment matches public input

Key Files

  • jolt-core/src/zkvm/bytecode/mod.rs: BytecodePreprocessing and BytecodePCMapper
  • jolt-core/src/zkvm/bytecode/read_raf_checking.rs: Bytecode RAF sumcheck
  • jolt-core/src/zkvm/ram/mod.rs: RAMPreprocessing (bytecode memory representation)
  • tracer/src/lib.rs: ELF decoding and instruction extraction (decode function)

Next Steps

Build docs developers (and LLMs) love