Skip to main content
CompactPolynomial is a specialized polynomial type that stores coefficients as small scalars (u8, u16, u32, u64, u128, i64, i128) instead of full field elements. This optimization provides 10-100x memory reduction and significant speedups in the sumcheck inner loop.

Problem: Field Element Overhead

Jolt uses BN254’s scalar field (Fr), where each element is 32 bytes:
// Standard polynomial representation
struct DensePolynomial<F: JoltField> {
    coeffs: Vec<F>,  // Each F is 32 bytes (BN254 Fr)
}

// Memory usage for 2^20 coefficients:
// 1,048,576 * 32 bytes = 33.5 MB per polynomial
But many polynomials in Jolt have small integer coefficients:
  • Instruction flags: 0 or 1 (boolean)
  • Memory addresses: 0 to 2^32 - 1 (u32)
  • Register indices: 0 to 31 (u8)
  • Lookup indices: small integers

Solution: Compact Representation

CompactPolynomial<T, F> stores coefficients as small type T and lazily converts to field elements:
pub struct CompactPolynomial<T: SmallScalar, F: JoltField> {
    num_vars: usize,
    len: usize,
    coeffs: Vec<T>,              // Small scalars (1-16 bytes each)
    bound_coeffs: Vec<F>,        // Field elements (populated after binding)
}

Memory Savings

Coefficient TypeBytes/ElementMemory for 2^20 coeffsvs. Field Element
bool11 MB32x reduction
u811 MB32x reduction
u1622 MB16x reduction
u3244 MB8x reduction
u6488 MB4x reduction
u128 / i1281616 MB2x reduction
Field element F3232 MB1x (baseline)

Implementation Details

Small Scalar Trait

Location: jolt-core/src/utils/small_scalar.rs
pub trait SmallScalar: Copy + Send + Sync + PartialOrd {
    fn to_field<F: JoltField>(&self) -> F;
    
    // Optimized: (b - a) * r without intermediate field conversion
    fn diff_mul_field<F: JoltField>(&self, other: Self, r: F) -> F;
    
    // Optimized: small_scalar * field_element
    fn field_mul<F: JoltField>(&self, r: F) -> F;
}

Two-Phase Representation

CompactPolynomial has two states: 1. Initial State (coeffs only):
let poly = CompactPolynomial::from_coeffs(vec![1u8, 2, 3, 4, ...]);
// coeffs: [1, 2, 3, 4, ...] (u8 vector)
// bound_coeffs: [] (empty)
2. Bound State (after first bind):
poly.bind(r, BindingOrder::LowToHigh);
// coeffs: [1, 2, 3, 4, ...] (unchanged, but no longer used)
// bound_coeffs: [f1, f2, ...] (field elements)
After the first bind(), subsequent operations use bound_coeffs.

Optimized Binding

The first bind() converts small scalars to field elements using optimized arithmetic:
impl<T: SmallScalar, F: JoltField> PolynomialBinding<F> for CompactPolynomial<T, F> {
    fn bind(&mut self, r: F::Challenge, order: BindingOrder) {
        let n = self.len() / 2;
        if self.is_bound() {
            // Already bound: use field arithmetic
            // Standard bind: bound_coeffs[i] = bound_coeffs[2*i] + r * (bound_coeffs[2*i+1] - bound_coeffs[2*i])
        } else {
            // First bind: convert small scalars to field with optimizations
            self.bound_coeffs = (0..n)
                .map(|i| {
                    let a = self.coeffs[2 * i];
                    let b = self.coeffs[2 * i + 1];
                    match a.cmp(&b) {
                        Ordering::Equal => a.to_field(),
                        // a < b: Compute a + r * (b - a) without full field conversion
                        Ordering::Less => {
                            a.to_field::<F>() + b.diff_mul_field::<F>(a, r.into())
                        }
                        // a > b: Compute a - r * (a - b)
                        Ordering::Greater => {
                            a.to_field::<F>() - a.diff_mul_field::<F>(b, r.into())
                        }
                    }
                })
                .collect();
        }
        self.num_vars -= 1;
        self.len = n;
    }
}

Key Optimizations

1. Lazy Conversion

Coefficients stay as small scalars until binding:
// No field conversion during initialization
let poly = CompactPolynomial::from_coeffs(coeffs);

// Conversion happens during first bind
poly.bind(r, order);  // coeffs[T] → bound_coeffs[F]

2. Optimized Arithmetic

// Standard: Two field conversions + one field multiplication
let result = a.to_field() + r * (b.to_field() - a.to_field());

// Optimized: One conversion + one small-scalar multiplication
let result = a.to_field() + b.diff_mul_field(a, r);

3. Special Case Detection

match a.cmp(&b) {
    Ordering::Equal => a.to_field(),  // No multiplication needed!
    Ordering::Less => /* ... */,
    Ordering::Greater => /* ... */,
}
For boolean polynomials (many coefficients are equal), this eliminates ~50% of multiplications.

4. Parallel Binding

fn bind_parallel(&mut self, r: F::Challenge, order: BindingOrder) {
    // Use rayon for parallel processing
    self.bound_coeffs = (0..n)
        .into_par_iter()
        .map(|i| /* optimized bind operation */)
        .collect();
}
Parallelization threshold: polynomials with ≥ 512 * 32 / F::NUM_BYTES elements.

Evaluation Optimizations

Inside-Out Evaluation

Faster polynomial evaluation based on Fast Polynomial Evaluation:
pub fn inside_out_evaluate(&self, r: &[F]) -> F {
    const PARALLEL_THRESHOLD: usize = 16;
    if r.len() < PARALLEL_THRESHOLD {
        self.inside_out_serial(r)
    } else {
        self.inside_out_parallel(r)
    }
}

fn inside_out_serial(&self, r: &[F]) -> F {
    // Convert coeffs to field elements once
    let mut current: Vec<F> = self.coeffs.iter().map(|&c| c.to_field()).collect();
    let m = r.len();
    for i in (0..m).rev() {
        let stride = 1 << i;
        let r_val = r[m - 1 - i];
        for j in 0..stride {
            let f0 = current[j];
            let f1 = current[j + stride];
            let slope = f1 - f0;
            // Optimized: check for zero/one slope
            current[j] = if slope.is_zero() {
                f0
            } else if slope.is_one() {
                f0 + r_val
            } else {
                f0 + slope * r_val
            };
        }
    }
    current[0]
}
Speedup: ~2x vs. standard evaluation

Split EQ Evaluation

Optimized evaluation when combined with EQ polynomial:
pub fn split_eq_evaluate(&self, r_len: usize, eq_one: &[F], eq_two: &[F]) -> F {
    // Computes: Σᵢⱼ coeffs[i*|eq_two| + j] * eq_one[i] * eq_two[j]
    (0..eq_one.len())
        .into_par_iter()
        .map(|x1| {
            let partial_sum = (0..eq_two.len())
                .map(|x2| {
                    let idx = x1 * eq_two.len() + x2;
                    self.coeffs[idx].field_mul(eq_two[x2])  // Optimized small * field
                })
                .sum();
            OptimizedMul::mul_01_optimized(partial_sum, eq_one[x1])
        })
        .sum()
}
This avoids materializing the full EQ polynomial evaluation.

Usage in Jolt

MultilinearPolynomial Dispatch

Location: jolt-core/src/poly/multilinear_polynomial.rs
pub enum MultilinearPolynomial<F: JoltField> {
    LargeScalars(DensePolynomial<F>),      // Full field elements
    BoolScalars(CompactPolynomial<bool, F>),
    U8Scalars(CompactPolynomial<u8, F>),
    U16Scalars(CompactPolynomial<u16, F>),
    U32Scalars(CompactPolynomial<u32, F>),
    U64Scalars(CompactPolynomial<u64, F>),
    U128Scalars(CompactPolynomial<u128, F>),
    I64Scalars(CompactPolynomial<i64, F>),
    I128Scalars(CompactPolynomial<i128, F>),
    S128Scalars(CompactPolynomial<S128, F>),  // Signed 128-bit
    // ... specialized variants
}

Witness Polynomials

Many witness polynomials use compact representation:
// Instruction flags (boolean)
let flags_poly = MultilinearPolynomial::BoolScalars(
    CompactPolynomial::from_coeffs(flags)
);

// Memory addresses (u64)
let addr_poly = MultilinearPolynomial::U64Scalars(
    CompactPolynomial::from_coeffs(addresses)
);

// Register indices (u8)
let reg_poly = MultilinearPolynomial::U8Scalars(
    CompactPolynomial::from_coeffs(register_indices)
);

Performance Impact

Sumcheck Inner Loop

The sumcheck inner loop dominates prover time:
// Without CompactPolynomial:
for i in 0..poly.len() {
    sum += poly.coeffs[i] * eq_table[i];  // 32-byte load + field mul
}

// With CompactPolynomial:
for i in 0..poly.len() {
    sum += poly.coeffs[i].field_mul(eq_table[i]);  // 1-16 byte load + optimized mul
}
Impact:
  • Memory bandwidth: 2-32x reduction in bytes read
  • Cache efficiency: More coefficients fit in L1/L2 cache
  • Arithmetic: Faster small-scalar multiplication

Concrete Measurements

For a program with 2^20 cycles:
PolynomialTypeMemory (Dense)Memory (Compact)Savings
Instruction flagsbool32 MB1 MB32x
Memory addressesu6432 MB8 MB4x
Operand valuesu6432 MB8 MB4x
Total (10 polys)320 MB60 MB5.3x
Prover speedup: 15-30% from reduced memory bandwidth and cache pressure.

Best Practices

When to Use CompactPolynomial

Use CompactPolynomial when:
  • Coefficients are small integers (< 2^64)
  • Polynomial is large (> 2^16 coefficients)
  • Memory is constrained
  • Polynomial is bound multiple times in sumcheck
Use DensePolynomial when:
  • Coefficients are arbitrary field elements
  • Polynomial is very small (< 2^10 coefficients)
  • Polynomial is evaluated but never bound

Choosing Scalar Type

Pick the smallest type that fits your values:
// Booleans (0/1)
CompactPolynomial::<bool, F>

// Small indices (0-255)
CompactPolynomial::<u8, F>

// Memory addresses (0 to 2^32-1)
CompactPolynomial::<u32, F>

// Full 64-bit values
CompactPolynomial::<u64, F>

// Signed values
CompactPolynomial::<i64, F>

Performance Checklist

  • Use smallest scalar type that fits values
  • Pre-allocate vectors for bound_coeffs using unsafe_allocate_zero_vec
  • Avoid clones in hot paths (use references)
  • Profile to verify memory savings translate to speedup

References

Build docs developers (and LLMs) love