The Bytecode Interpreter
The bytecode interpreter is the part of CPython that executes compiled Python code. Its entry point is_PyEval_EvalFrameDefault() in Python/ceval.c.
High-Level Architecture
At its core, the interpreter is a loop that iterates over bytecode instructions, executing each via a large switch statement. This switch statement is automatically generated from instruction definitions in Python/bytecodes.c using a specialized DSL.Execution Flow
PyEval_EvalCode()is called with a CodeObject- A Frame is constructed for the code object
_PyEval_EvalFrame()is called to execute the frame- By default, this calls
_PyEval_EvalFrameDefault()(configurable via PEP 523) - The interpreter loop decodes and executes instructions
Thread State
The interpreter receives aPyThreadState object (tstate) that contains:
- Exception state
- Recursion depth tracking
- Per-interpreter state (
tstate->interp) - Per-runtime global state (
tstate->interp->runtime)
Instruction Decoding
Bytecode is stored as an array of 16-bit code units (_Py_CODEUNIT).
Code Unit Format
Each code unit contains:- 8-bit opcode (first byte)
- 8-bit oparg (second byte, unsigned)
_Py_OPCODE(word)- Extract opcode_Py_OPARG(word)- Extract argument
Basic Interpreter Loop
Extended Arguments
The 8-bit oparg limits arguments to 0-255. For larger values, theEXTENDED_ARG instruction prefixes the main instruction.
Example:
oparg of 65538 (0x1_00_02).
Up to three
EXTENDED_ARG prefixes can be used, allowing 32-bit arguments.Jump Instructions
When the switch statement is reached,next_instr already points to the next instruction. Jumps work by modifying this pointer:
- Forward jump:
next_instr += oparg - Backward jump:
next_instr -= oparg
Inline Cache Entries
Specialized instructions have associated inline caches stored as additional code units following the instruction.Cache Structure
- Cache size is fixed per opcode
- All instructions in a specialization family have the same cache size
- Caches are initialized to zeros by the compiler
- Accessed by casting
next_instrto a struct pointer
Important: The instruction implementation must advance
next_instr past the cache using next_instr += n or JUMPBY(n) macro.The Evaluation Stack
CPython’s interpreter is a stack machine. Most instructions operate by pushing and popping values from the stack.Stack Characteristics
- Pre-allocated array of
PyObject*pointers in the frame - Size determined by
co_stacksizefield of code object - Grows upward in memory
- Stack pointer (
stack_pointer) tracks current top
Stack Operations
Stack Metadata
Stack effects for each instruction are exposed through:_PyOpcode_num_popped()- Items consumed_PyOpcode_num_pushed()- Items produced
Don’t confuse the evaluation stack with the call stack! The evaluation stack holds operands for bytecode operations, while the call stack manages function calls.
Error Handling
When an instruction raises an exception, execution jumps to theexception_unwind label in Python/ceval.c.
The exception is then handled using the exception table stored in the code object.
Python-to-Python Calls
Since Python 3.11, Python function calls are “inlined” for efficiency:CALLinstruction detects Python function objects- New frame is pushed onto the call stack
- Interpreter “jumps” to callee’s bytecode (no C recursion)
RETURN_VALUEpops frame and returns to callerframe->is_entryflag indicates if frame was inlined
Entry Frames
Frames withis_entry set return from _PyEval_EvalFrameDefault() to C code. Other frames return to Python bytecode.
The Call Stack
Since Python 3.11, frames use the internal_PyInterpreterFrame structure instead of full PyFrameObject instances.
Frame Allocation
- Most frames allocated contiguously in per-thread stack
- Functions:
_PyThreadState_PushFrame()in Python/pystate.c - Fast path:
_PyFrame_PushUnchecked()when space is available - Generator/coroutine frames embedded in generator objects
Frame Objects
FullPyFrameObject instances are only created when needed:
sys._getframe()is called- Debuggers access frame
- Extension modules call
PyEval_GetFrame()
Specialization
Introduced in PEP 659, bytecode specialization rewrites instructions at runtime based on observed types.Adaptive Instructions
Specializable instructions:- Track execution count in inline cache
- Call
_Py_Specialize_XXX()when hot (Python/specialize.c) - Replace with specialized version if applicable
Instruction Families
A family consists of:- Adaptive instruction - Base implementation with counter
- Specialized forms - Optimized for specific types/values
LOAD_GLOBAL- Adaptive baseLOAD_GLOBAL_MODULE- Specialized for module globalsLOAD_GLOBAL_BUILTIN- Specialized for builtins
Deoptimization
Specialized instructions include guard checks:Performance Metric:
Specialization benefit =
Specialization benefit =
Tbase / TadaptiveWhere:Tbase= time for base instructionTadaptive= weighted average time across all forms + misses
Adding New Bytecode Instructions
To add a new opcode:- Define instruction in Python/bytecodes.c
- Document it in Doc/library/dis.rst
- Run
make regen-casesto generate implementation - Update magic number in
Lib/importlib/_bootstrap_external.py - Run
make regen-importlib - Update compiler in Python/codegen.c to emit the new instruction
Changing the magic number invalidates all existing .pyc files, forcing recompilation.
Performance Tips
Specialization Design
- Keep
Ti(specialized instruction time) low - Minimize branches and dependent memory accesses
- Keep inline caches small to reduce memory pressure
- Record statistics with
STAT_INC(BASE_INSTRUCTION, hit)
Testing Specializations
Instrument specialization functions to gather usage patterns before designing specialized forms.Related Topics
- Compiler Design - How bytecode is generated
- Code Objects - Structure of compiled code
- Frames - Execution frame structure
- Exception Handling - Exception table format
- JIT Compiler - Tier 2 optimization
