JIT Compilation

HotSpot’s Just-In-Time (JIT) compilation system transforms bytecode into optimized native machine code at runtime. The VM includes multiple compiler implementations optimized for different scenarios.

Compiler Architecture

HotSpot integrates three main compiler implementations:

C1 Compiler

Fast client compiler for quick startup

C2 Compiler

Optimizing server compiler for peak performance

Graal Compiler

Java-based experimental compiler via JVMCI

C1 Client Compiler

The C1 compiler provides fast compilation with moderate optimization. Located in src/hotspot/share/c1/.

Design Philosophy

From c1_Compiler.hpp:

// There is one instance of the Compiler per CompilerThread.

class Compiler: public AbstractCompiler {
  virtual const char* name() { return "C1"; }
  
  virtual void compile_method(ciEnv* env, 
                             ciMethod* target, 
                             int entry_bci,
                             bool install_code,
                             DirectiveSet* directive);
};

C1 prioritizes:

Fast compilation speed - Compiles methods quickly
Low overhead - Minimal memory and CPU usage
Profiling support - Gathers runtime statistics for C2

Compilation Pipeline

C1 uses a structured compilation pipeline:

Bytecode → HIR → LIR → Machine Code
    ↓        ↓      ↓         ↓
  Parse  Optimize  Lower   Register
                          Allocation

High-Level IR (HIR)
Low-Level IR (LIR)
Code Generation

First intermediate representation (c1_Instruction.hpp, c1_IR.hpp):

Graph-based - Control flow and data flow graphs
SSA form - Static Single Assignment for optimization
Type information - Preserves Java type semantics

HIR optimizations:

Constant folding and propagation
Common subexpression elimination
Null check elimination
Method inlining (limited)

Machine-oriented representation (c1_LIR.hpp):

Linear instruction sequence - Closer to machine code
Virtual registers - Infinite register set
Explicit control flow - Blocks and branches

LIR generation:

Tree-walking code generation from HIR
Platform-specific instruction selection
Safepoint insertion
GC barrier insertion

Final machine code emission (c1_CodeStubs.hpp):

Register allocation - Linear scan algorithm
Instruction scheduling - Basic block level
Code emission - Platform-specific assembler
Metadata generation - OopMaps, exception handlers

C1 uses a fast linear-scan register allocator for quick compilation.

Graph Building

The c1_GraphBuilder class constructs HIR from bytecode:

// From c1_GraphBuilder.hpp:
class GraphBuilder {
  // Parses bytecode and builds HIR graph
  // Handles:
  // - Control flow (branches, loops, exceptions)
  // - Type inference and checking
  // - Inlining decisions
  // - Profile data collection points
};

Frame Maps

C1 maintains frame maps (c1_FrameMap.hpp) for:

Local variable locations (stack/register)
Spill slot management
Calling convention handling
Debugger support

Profiling Support

When compiling with profiling (tiered compilation levels 2-3):

Method invocation counters - Track call frequency
Branch counters - Record branch taken/not-taken
Type profiles - Receiver types at call sites
Null check profiles - Null/non-null statistics

This data guides C2 optimization decisions.

C2 Server Compiler

The C2 compiler performs aggressive optimization for peak performance. Located in src/hotspot/share/opto/.

Design Philosophy

From c2compiler.hpp:

class C2Compiler : public AbstractCompiler {
  const char *name() { return "C2"; }
  
  // Compilation with aggressive optimization
  void compile_method(ciEnv* env,
                     ciMethod* target,
                     int entry_bci,
                     bool install_code,
                     DirectiveSet* directive);
  
  // Retry mechanisms for optimization failures:
  static const char* retry_no_subsuming_loads();
  static const char* retry_no_escape_analysis();
  static const char* retry_no_iterative_escape_analysis();
  // ...
};

C2 features:

Aggressive optimizations - Peak performance focus
Sea-of-nodes IR - Flexible graph-based representation
Global analysis - Whole-method optimization
Speculative optimizations - Profile-guided assumptions

C2 is called “opto” in the source tree because it was originally the “optimizing compiler” contrasted with the simpler C1.

Sea-of-Nodes IR

C2’s intermediate representation is a graph where:

Nodes represent operations (addnode.hpp, callnode.hpp, etc.)
Edges represent dependencies (data and control)
No fixed order - Scheduler determines execution order
Ideal transformations - Pattern-based optimization

Node types include:

// From various *node.hpp files:
AddNode      - Integer/FP addition
CallNode     - Method invocations  
LoadNode     - Memory reads
StoreNode    - Memory writes
IfNode       - Conditional branches
LoopNode     - Loop headers
PhiNode      - SSA merge points
// ... hundreds of node types

Optimization Phases

C2 compilation proceeds through multiple phases:

Parsing
Optimization
Code Generation

Bytecode → Initial graph (bytecodeInfo.cpp):

Parse bytecodes - Build initial node graph
Inlining - Aggressive method inlining decisions
Type sharpening - Refine types using profiles
Exception handling - Build exception control flow

Inlining decisions based on:

Method size and complexity
Call frequency (from profiles)
Inline depth limits
Compilation budget

Graph transformations:IGVN (Iterative Global Value Numbering):

Common subexpression elimination
Constant folding and propagation
Algebraic simplifications
Dead code elimination

CCP (Conditional Constant Propagation):

Flow-sensitive constant propagation
Unreachable code detection

Loop Optimizations (loopnode.cpp, loopopts.cpp):

Loop unrolling
Loop peeling
Loop unswitching
Induction variable analysis
Range check elimination

Escape Analysis (escape.cpp):

Scalar replacement of aggregates
Stack allocation of non-escaping objects
Lock elision for local objects
Elimination of allocation

Graph → Machine code:

GCM (Global Code Motion) - Schedule nodes optimally
Register Allocation - Chaitin-style graph coloring
Peephole Optimization - Local instruction patterns
Code Emission - Platform-specific assembly

Platform-specific backends:

src/hotspot/cpu/x86/ - x86/x64 code generation
src/hotspot/cpu/aarch64/ - ARM 64-bit
Architecture description files (.ad files)

Ideal Graph Transformations

C2’s optimization engine applies pattern-based transformations:

// Conceptual example from node optimization:
IdentityNode(AddNode(x, 0)) → x  // x + 0 = x
IdentityNode(MulNode(x, 1)) → x  // x * 1 = x  
AddNode(AddNode(x, c1), c2) → AddNode(x, c1+c2)  // constant folding

Each node type implements:

Ideal() - Graph transformations
Identity() - Identity simplifications
Value() - Constant folding

Deoptimization Support

C2 can speculatively optimize based on profile data. If assumptions are violated:

Uncommon trap triggered
Execution deoptimizes to interpreter
Interpreter continues execution
Method may be recompiled with different assumptions

Deoptimization metadata (buildOopMap.cpp):

Maps machine state → interpreter state
Reconstructs stack frames
Restores Java-visible state

SuperWord Optimization

C2 includes automatic vectorization (superword.cpp):

Identifies parallel operations in loops
Combines scalar operations into SIMD instructions
Platform-specific vector instruction support
Significant speedups for array operations

Graal Compiler

Graal is a Java-based compiler accessible via JVMCI (JVM Compiler Interface). Located in src/jdk.graal.compiler/.

JVMCI Architecture

The JVMCI interface (src/hotspot/share/jvmci/) provides:

// From jvmciCompiler.hpp:
class JVMCICompiler : public AbstractCompiler {
  // Java-based compiler implementation
  // Compilation requests forwarded to Java code
  // Runtime services provided by VM
};

Key JVMCI Components:

CompilerToVM - VM services for compiler (metadata access, etc.)
VMToCompiler - Compiler callbacks from VM
Code Installation - Installing compiled code
Metadata Access - Reading VM structures from Java

Graal Benefits

Modern Java

Written in Java, easier to understand and modify

Advanced Optimizations

Partial evaluation, advanced inlining

Language Agnostic

Powers GraalVM polyglot execution

Research Platform

Experimental optimizations and techniques

Graal vs C2

Aspect	C2	Graal
Language	C++	Java
Maturity	Decades of tuning	Newer, evolving
Peak Performance	Excellent	Comparable
Compile Time	Fast	Slower
Extensibility	Limited	Excellent
Partial Evaluation	No	Yes

Graal can be used as a replacement for C2 with -XX:+UseJVMCICompiler but is not the default in standard OpenJDK builds.

Tiered Compilation

Modern HotSpot combines interpreters and compilers in a tiered strategy:

Compilation Levels

Level	Execution Mode	Profiling	Optimizations	Purpose
0	Interpreter	Yes	None	Initial execution, gathering data
1	C1	No	Minimal	Fast compilation, no profiling
2	C1	Yes	Minimal	Limited C1 with full profiling
3	C1	Yes	Full	Full C1 optimization with profiling
4	C2	No	Aggressive	Peak performance optimization

Transition Strategy

Typical progression for a hot method:

Interpreter (L0) → Simple C1 (L3) → C2 (L4)
     ↓                  ↓              ↓
  Profiling         Profiling     Peak perf

Alternative paths:

L0 → L1 - Quick compilation without profiling
L0 → L3 → L1 - Deoptimize if method gets cold
L3 → L4 - Recompile with C2 when very hot

Compilation Thresholds

Controlled by invocation and back-edge counters:

// From invocationCounter.hpp:
class InvocationCounter {
  uint _counter;  // Combined counter value
  
  // Methods to update and check thresholds
  void increment();
  bool reached_threshold();
};

Configurable via flags:

-XX:Tier0InvokeNotifyFreqLog - Interpreter threshold
-XX:Tier3InvocationThreshold - C1 threshold
-XX:Tier4InvocationThreshold - C2 threshold

Compilation Queue

Compilation requests are managed by a priority queue:

Method nominated for compilation (threshold reached)
Added to queue with priority (based on hotness)
CompilerThread dequeues and compiles
Code installed in code cache
Future calls use compiled version

CompilerThreads

Dedicated threads for compilation:

// Thread hierarchy from thread.hpp:
JavaThread
  └── CompilerThread  // Runs C1/C2 compilation tasks

Thread count configurable:

-XX:CICompilerCount=N - Total compiler threads
-XX:CICompilerCountPerCPU - Threads per CPU

Default: 2 C1 threads + (N-2) C2 threads on N-core systems

Intrinsics

Both C1 and C2 support intrinsic methods - hand-written assembly for critical operations:

// From c1_Compiler.hpp and c2compiler.hpp:
static bool is_intrinsic_supported(vmIntrinsics::ID id);

Common intrinsics:

String.equals() - Vectorized string comparison
System.arraycopy() - Optimized memory copy
Math.sin/cos/sqrt() - Native math routines
Unsafe operations - Direct memory access
AES encryption - Hardware-accelerated crypto
CRC32 - SIMD checksums

Intrinsics provide 10-100x speedups for critical operations.

Code Cache

Compiled code stored in code cache (src/hotspot/share/code/):

Code Cache Segments

Non-nmethods - VM runtime stubs, adapters
Profiled nmethods - C1-compiled methods with profiling
Non-profiled nmethods - C2-optimized methods

Each segment can be sized independently:

-XX:NonNMethodCodeHeapSize
-XX:ProfiledCodeHeapSize
-XX:NonProfiledCodeHeapSize

Code Cache Management

When code cache fills:

Flush old code - Remove cold/unused methods
Stop compilation - No more JIT until space available
Log warning - “Code cache is full”

Monitor with: -XX:+PrintCodeCache

Performance Tuning

Disable Tiered Compilation

# Use only C2 (for throughput):
java -XX:-TieredCompilation ...

# Use only C1 (for fast startup):
java -XX:TieredStopAtLevel=1 ...

Compilation Logging

# See compilation activity:
java -XX:+PrintCompilation ...

# Detailed compilation logs:
java -XX:+UnlockDiagnosticVMOptions \
     -XX:+LogCompilation \
     -XX:LogFile=compilation.log ...

Inline Tuning

# Increase inline limits:
-XX:MaxInlineLevel=15        # Inline depth
-XX:MaxInlineSize=50         # Bytecode size
-XX:FreqInlineSize=200       # Hot method size

Get Started

Building the JDK

Architecture

Development

Tools & Utilities

​Compiler Architecture

C1 Compiler

C2 Compiler

Graal Compiler

​C1 Client Compiler

​Design Philosophy

​Compilation Pipeline

​Graph Building

​Frame Maps

​Profiling Support

​C2 Server Compiler

​Design Philosophy

​Sea-of-Nodes IR

​Optimization Phases

​Ideal Graph Transformations

​Deoptimization Support

​SuperWord Optimization

​Graal Compiler

​JVMCI Architecture

​Graal Benefits

Modern Java

Advanced Optimizations

Language Agnostic

Research Platform

​Graal vs C2

​Tiered Compilation

​Compilation Levels

​Transition Strategy

​Compilation Thresholds

​Compilation Queue

​CompilerThreads

​Intrinsics

​Code Cache

​Code Cache Segments

​Code Cache Management

​Performance Tuning

​Disable Tiered Compilation

​Compilation Logging

​Inline Tuning

​Next Steps

HotSpot VM

Module System

Build docs developers (and LLMs) love

Compiler Architecture

C1 Client Compiler

Design Philosophy

Compilation Pipeline

Graph Building

Frame Maps

Profiling Support

C2 Server Compiler

Design Philosophy

Sea-of-Nodes IR

Optimization Phases

Ideal Graph Transformations

Deoptimization Support

SuperWord Optimization

Graal Compiler

JVMCI Architecture

Graal Benefits

Graal vs C2

Tiered Compilation

Compilation Levels

Transition Strategy

Compilation Thresholds

Compilation Queue

CompilerThreads

Intrinsics

Code Cache

Code Cache Segments

Code Cache Management

Performance Tuning

Disable Tiered Compilation

Compilation Logging

Inline Tuning

Next Steps