Skip to main content

The JIT Compiler

The JIT (Just-In-Time) compiler is CPython’s tier 2 optimization system that compiles hot bytecode sequences into optimized machine code.

Architecture Overview

CPython has a two-tier execution system:
Historically called “tier 2” in the codebase, you’ll see references to tier2 in function and variable names.

When JIT Compilation Occurs

The JIT activates when a JUMP_BACKWARD instruction becomes “hot”:
  1. JUMP_BACKWARD has an inline cache counter
  2. Counter decrements on each execution
  3. When counter reaches zero (threshold exceeded):
    • Call _PyOptimizer_Optimize() in Python/optimizer.c
    • Pass current frame and instruction pointer
    • Create optimized executor for the trace

Backoff Counter

Threshold determined by backoff_counter_triggers() in Include/internal/pycore_backoff.h.

Executors

An executor is an optimized version of a bytecode trace, represented by _PyExecutorObject in Include/internal/pycore_optimizer.h.

Executor Storage

Executors are stored in the code object:
struct PyCodeObject {
    // ...
    _PyExecutorArray *co_executors;  // Array of executors
    // ...
};

ENTER_EXECUTOR Instruction

Once an executor is created:
  1. JUMP_BACKWARD replaced with ENTER_EXECUTOR
  2. oparg contains index into co_executors array
  3. Subsequent iterations use the executor directly

Executor Exits

Executors determine where to transfer control:
  • Return to tier 1 interpreter
  • Transfer to another executor
Exit information stored in _PyExitData structure.

The Micro-op Optimizer

Defined in Python/optimizer.c as _PyOptimizer_Optimize.

Trace Translation

The optimizer:
  1. Identifies trace - Sequence of bytecode starting from hot jump
  2. Expands to micro-ops - Each bytecode → sequence of micro-ops
  3. Optimizes - Apply optimization passes
  4. Creates executor - Instance of _PyUOpExecutor_Type

Micro-ops (uops)

Micro-operations are lower-level than bytecode:
# Bytecode instruction
LOAD_ATTR  5

# Expands to micro-ops (example):
GUARD_TYPE_VERSION
LOAD_ATTR_SLOT
Macro expansions defined in pycore_opcode_metadata.h, generated from Python/bytecodes.c.

Optimization Pass

_Py_uop_analyze_and_optimize() in Python/optimizer_analysis.c performs:
  • Dead code elimination
  • Redundant guard removal
  • Constant propagation
  • Type specialization

JIT Interpreter

The JIT interpreter is the simpler of two executor implementations, useful for debugging.

Enabling

Configure with:
./configure --enable-experimental-jit=interpreter

Execution

When ENTER_EXECUTOR runs:
  1. Jump to tier2_dispatch: label in Python/ceval.c
  2. Loop executes micro-ops via switch statement
  3. Switch cases in Python/executor_cases.c.h
  4. Generated by Tools/cases_generator/tier2_generator.py

Exit Instructions

  • _EXIT_TRACE - Planned exit, return to tier 1
  • _DEOPT - Deoptimization due to guard failure
Both return control to the adaptive interpreter.

Full JIT (Copy-and-Patch)

The full JIT compiles micro-ops to native machine code.

Enabling

./configure --enable-experimental-jit

Architecture

Uses copy-and-patch compilation:
  1. Pre-compiled stencils for each micro-op
  2. Runtime patching fills in specific values
  3. Efficient compilation without complex codegen
Copy-and-patch technique described in Haoran Xu’s article and the paper “Copy-and-Patch Compilation”.

Stencil Generation

At build time, make regen-jit generates stencils:
  1. Read Python/executor_cases.c.h
  2. For each micro-op, create .c file with template from Tools/jit/template.c
  3. Compile with LLVM to produce object files
  4. Extract machine code into jit_stencils.h

JIT Compilation

_PyJIT_Compile() in Python/jit.c:
  1. Allocate executable memory
  2. For each micro-op:
    • Copy stencil code
    • Patch runtime values (constants, object pointers, etc.)
  3. Set executor->jit_code to point to compiled function

JIT Function Signature

Defined in pycore_jit.h:
typedef _Py_CODEUNIT *(*jit_func)(
    _PyInterpreterFrame *frame,
    PyObject **stack_pointer,
    PyThreadState *tstate
);
Returns instruction pointer for next tier 1 instruction.

Execution

When ENTER_EXECUTOR encounters JIT code:
  1. Check if executor->jit_code is set
  2. Call JIT function instead of tier 2 interpreter
  3. Function returns next instruction pointer
  4. Continue execution from returned location

Executor Invalidation

Executors may become invalid when assumptions change.

Executor List

All executors stored in interpreter state:
struct _is {
    // ...
    _PyExecutorObject *executor_list_head;
    // ...
};
Maintains linked list for iteration.

Invalidation Triggers

  • Type modified (method added/removed)
  • Global/builtin modified
  • Module dict modified
  • Code object modified

Invalidation Process

  1. Iterate executor_list_head
  2. Mark affected executors as invalid
  3. Next ENTER_EXECUTOR will recompile or deoptimize

Example: JIT in Action

def hot_loop(n):
    total = 0
    for i in range(n):  # JUMP_BACKWARD here
        total += i
    return total

# First iterations: tier 1 interpreter
# Counter in JUMP_BACKWARD decrements

hot_loop(10)  # May still use tier 1

# After threshold iterations:
# - JUMP_BACKWARD triggers JIT compilation
# - Trace compiled to executor  
# - JUMP_BACKWARD replaced with ENTER_EXECUTOR

hot_loop(1000)  # Now uses JIT executor

Trace Example

Simplified micro-op trace for loop body:
GUARD_TYPE_VERSION  # total is int
GUARD_TYPE_VERSION  # i is int  
BINARY_OP_ADD_INT   # total += i
STORE_FAST 0        # store total
LOAD_FAST 1         # load i
LOAD_CONST 1        # load 1
BINARY_OP_ADD_INT   # i += 1
STORE_FAST 1        # store i
LOAD_FAST 1         # load i
LOAD_FAST 2         # load n
COMPARE_OP_INT <    # i < n
POP_JUMP_IF_TRUE -X # back to start or exit
Guards ensure type assumptions hold; deoptimize if violated.

Performance Benefits

The JIT provides:

Reduced Dispatch Overhead

  • Tier 1: Decode + dispatch for every instruction
  • JIT: Direct machine code execution

Better Register Allocation

  • Tier 1: Stack-based with memory ops
  • JIT: Can keep values in registers across instructions

Inlining Opportunities

  • Micro-ops can inline small operations
  • Eliminates call overhead

Typical Speedup

For hot numeric loops:
  • 2-4x faster than tier 1 interpreter
  • Still slower than compiled languages (C, Rust)
  • Best for tight loops with predictable types

Configuration

Build Options

# JIT interpreter only (debugging)
./configure --enable-experimental-jit=interpreter

# Full JIT (copy-and-patch)
./configure --enable-experimental-jit

# No JIT (default)
./configure

Runtime Control

Currently no runtime flags to control JIT behavior. It activates automatically for hot code.

Debugging JIT

JIT Stats

Compile with JIT stats:
./configure --enable-experimental-jit=interpreter --enable-pystats
make
View stats:
import sys
sys._stats_on()
# Run code
sys._stats_off() 
sys._stats_dump()

Disabling JIT

For debugging, rebuild without JIT:
./configure
make

Implementation Status

Experimental Feature: The JIT is experimental in Python 3.13+. APIs and behavior may change in future versions.

Supported Platforms

  • x86-64 (Linux, macOS, Windows)
  • ARM64 (Linux, macOS)

Limitations

  • Not all bytecode instructions have micro-op translations
  • Some operations force deoptimization to tier 1
  • Exception handling may deoptimize

Further Reading

Videos

Brandt Bucher’s PyCon US 2023 talk - Inside CPython 3.11’s specializing adaptive interpreter PyCon 2024: Building a JIT compiler for CPython

Papers

Copy-and-Patch Compilation - Fast compilation algorithm for high-level languages

Build docs developers (and LLMs) love