The JIT Compiler

The JIT (Just-In-Time) compiler is CPython’s tier 2 optimization system that compiles hot bytecode sequences into optimized machine code.

Architecture Overview

CPython has a two-tier execution system:

Tier 1: Adaptive interpreter with specialized bytecode
Tier 2: JIT compiler for hot code paths

Historically called “tier 2” in the codebase, you’ll see references to tier2 in function and variable names.

When JIT Compilation Occurs

The JIT activates when a JUMP_BACKWARD instruction becomes “hot”:

JUMP_BACKWARD has an inline cache counter
Counter decrements on each execution
When counter reaches zero (threshold exceeded):
- Call _PyOptimizer_Optimize() in Python/optimizer.c
- Pass current frame and instruction pointer
- Create optimized executor for the trace

Backoff Counter

Threshold determined by backoff_counter_triggers() in Include/internal/pycore_backoff.h.

Executors

An executor is an optimized version of a bytecode trace, represented by _PyExecutorObject in Include/internal/pycore_optimizer.h.

Executor Storage

Executors are stored in the code object:

struct PyCodeObject {
    // ...
    _PyExecutorArray *co_executors;  // Array of executors
    // ...
};

ENTER_EXECUTOR Instruction

Once an executor is created:

JUMP_BACKWARD replaced with ENTER_EXECUTOR
oparg contains index into co_executors array
Subsequent iterations use the executor directly

Executor Exits

Executors determine where to transfer control:

Return to tier 1 interpreter
Transfer to another executor

Exit information stored in _PyExitData structure.

The Micro-op Optimizer

Defined in Python/optimizer.c as _PyOptimizer_Optimize.

Trace Translation

The optimizer:

Identifies trace - Sequence of bytecode starting from hot jump
Expands to micro-ops - Each bytecode → sequence of micro-ops
Optimizes - Apply optimization passes
Creates executor - Instance of _PyUOpExecutor_Type

Micro-ops (uops)

Micro-operations are lower-level than bytecode:

# Bytecode instruction
LOAD_ATTR  5

# Expands to micro-ops (example):
GUARD_TYPE_VERSION
LOAD_ATTR_SLOT

Macro expansions defined in pycore_opcode_metadata.h, generated from Python/bytecodes.c.

Optimization Pass

_Py_uop_analyze_and_optimize() in Python/optimizer_analysis.c performs:

Dead code elimination
Redundant guard removal
Constant propagation
Type specialization

JIT Interpreter

The JIT interpreter is the simpler of two executor implementations, useful for debugging.

Enabling

Configure with:

./configure --enable-experimental-jit=interpreter

Execution

When ENTER_EXECUTOR runs:

Jump to tier2_dispatch: label in Python/ceval.c
Loop executes micro-ops via switch statement
Switch cases in Python/executor_cases.c.h
Generated by Tools/cases_generator/tier2_generator.py

Exit Instructions

_EXIT_TRACE - Planned exit, return to tier 1
_DEOPT - Deoptimization due to guard failure

Both return control to the adaptive interpreter.

Full JIT (Copy-and-Patch)

The full JIT compiles micro-ops to native machine code.

Enabling

./configure --enable-experimental-jit

Architecture

Uses copy-and-patch compilation:

Pre-compiled stencils for each micro-op
Runtime patching fills in specific values
Efficient compilation without complex codegen

Copy-and-patch technique described in Haoran Xu’s article and the paper “Copy-and-Patch Compilation”.

Stencil Generation

At build time, make regen-jit generates stencils:

Read Python/executor_cases.c.h
For each micro-op, create .c file with template from Tools/jit/template.c
Compile with LLVM to produce object files
Extract machine code into jit_stencils.h

JIT Compilation

_PyJIT_Compile() in Python/jit.c:

Allocate executable memory
For each micro-op:
- Copy stencil code
- Patch runtime values (constants, object pointers, etc.)
Set executor->jit_code to point to compiled function

JIT Function Signature

Defined in pycore_jit.h:

typedef _Py_CODEUNIT *(*jit_func)(
    _PyInterpreterFrame *frame,
    PyObject **stack_pointer,
    PyThreadState *tstate
);

Returns instruction pointer for next tier 1 instruction.

Execution

When ENTER_EXECUTOR encounters JIT code:

Check if executor->jit_code is set
Call JIT function instead of tier 2 interpreter
Function returns next instruction pointer
Continue execution from returned location

Executor Invalidation

Executors may become invalid when assumptions change.

Executor List

All executors stored in interpreter state:

struct _is {
    // ...
    _PyExecutorObject *executor_list_head;
    // ...
};

Maintains linked list for iteration.

Invalidation Triggers

Type modified (method added/removed)
Global/builtin modified
Module dict modified
Code object modified

Invalidation Process

Iterate executor_list_head
Mark affected executors as invalid
Next ENTER_EXECUTOR will recompile or deoptimize

Example: JIT in Action

def hot_loop(n):
    total = 0
    for i in range(n):  # JUMP_BACKWARD here
        total += i
    return total

# First iterations: tier 1 interpreter
# Counter in JUMP_BACKWARD decrements

hot_loop(10)  # May still use tier 1

# After threshold iterations:
# - JUMP_BACKWARD triggers JIT compilation
# - Trace compiled to executor  
# - JUMP_BACKWARD replaced with ENTER_EXECUTOR

hot_loop(1000)  # Now uses JIT executor

Trace Example

Simplified micro-op trace for loop body:

GUARD_TYPE_VERSION  # total is int
GUARD_TYPE_VERSION  # i is int  
BINARY_OP_ADD_INT   # total += i
STORE_FAST 0        # store total
LOAD_FAST 1         # load i
LOAD_CONST 1        # load 1
BINARY_OP_ADD_INT   # i += 1
STORE_FAST 1        # store i
LOAD_FAST 1         # load i
LOAD_FAST 2         # load n
COMPARE_OP_INT <    # i < n
POP_JUMP_IF_TRUE -X # back to start or exit

Guards ensure type assumptions hold; deoptimize if violated.

Performance Benefits

The JIT provides:

Reduced Dispatch Overhead

Tier 1: Decode + dispatch for every instruction
JIT: Direct machine code execution

Better Register Allocation

Tier 1: Stack-based with memory ops
JIT: Can keep values in registers across instructions

Inlining Opportunities

Micro-ops can inline small operations
Eliminates call overhead

Typical Speedup

For hot numeric loops:

2-4x faster than tier 1 interpreter
Still slower than compiled languages (C, Rust)
Best for tight loops with predictable types

Configuration

Build Options

# JIT interpreter only (debugging)
./configure --enable-experimental-jit=interpreter

# Full JIT (copy-and-patch)
./configure --enable-experimental-jit

# No JIT (default)
./configure

Runtime Control

Currently no runtime flags to control JIT behavior. It activates automatically for hot code.

Debugging JIT

JIT Stats

Compile with JIT stats:

./configure --enable-experimental-jit=interpreter --enable-pystats
make

View stats:

import sys
sys._stats_on()
# Run code
sys._stats_off() 
sys._stats_dump()

Disabling JIT

For debugging, rebuild without JIT:

./configure
make

Implementation Status

Experimental Feature: The JIT is experimental in Python 3.13+. APIs and behavior may change in future versions.

Supported Platforms

x86-64 (Linux, macOS, Windows)
ARM64 (Linux, macOS)

Limitations

Not all bytecode instructions have micro-op translations
Some operations force deoptimization to tier 1
Exception handling may deoptimize

Architecture

Runtime

Advanced Topics

​The JIT Compiler

​Architecture Overview

​When JIT Compilation Occurs

​Backoff Counter

​Executors

​Executor Storage

​ENTER_EXECUTOR Instruction

​Executor Exits

​The Micro-op Optimizer

​Trace Translation

​Micro-ops (uops)

​Optimization Pass

​JIT Interpreter

​Enabling

​Execution

​Exit Instructions

​Full JIT (Copy-and-Patch)

​Enabling

​Architecture

​Stencil Generation

​JIT Compilation

​JIT Function Signature

​Execution

​Executor Invalidation

​Executor List

​Invalidation Triggers

​Invalidation Process

​Example: JIT in Action

​Trace Example

​Performance Benefits

​Reduced Dispatch Overhead

​Better Register Allocation

​Inlining Opportunities

​Typical Speedup

​Configuration

​Build Options

​Runtime Control

​Debugging JIT

​JIT Stats

​Disabling JIT

​Implementation Status

​Supported Platforms

​Limitations

​Further Reading

​Videos

​Papers

​Related Topics

Build docs developers (and LLMs) love

The JIT Compiler

Architecture Overview

When JIT Compilation Occurs

Backoff Counter

Executors

Executor Storage

ENTER_EXECUTOR Instruction

Executor Exits

The Micro-op Optimizer

Trace Translation

Micro-ops (uops)

Optimization Pass

JIT Interpreter

Enabling

Execution

Exit Instructions

Full JIT (Copy-and-Patch)

Enabling

Architecture

Stencil Generation

JIT Compilation

JIT Function Signature

Execution

Executor Invalidation

Executor List

Invalidation Triggers

Invalidation Process

Example: JIT in Action

Trace Example

Performance Benefits

Reduced Dispatch Overhead

Better Register Allocation

Inlining Opportunities

Typical Speedup

Configuration

Build Options

Runtime Control

Debugging JIT

JIT Stats

Disabling JIT

Implementation Status

Supported Platforms

Limitations

Further Reading

Videos

Papers

Related Topics