runtime/vm/compiler/backend/flow_graph_compiler.cc and architecture-specific files.
Code Generation Architecture
The code generation process consists of several stages:- IL Finalization: Prepare IL for code generation
- Register Allocation: Assign registers to values
- Instruction Lowering: Convert IL to machine instructions
- Code Emission: Generate native code bytes
- Metadata Generation: Create debugging and deoptimization info
FlowGraphCompiler
TheFlowGraphCompiler class (flow_graph_compiler.cc:135) orchestrates code generation:
Key Responsibilities
Block Ordering
Block Ordering
Determines the order blocks are emitted in native code.Optimized for:
- Cache locality
- Branch prediction
- Fall-through optimization
Exception Handling
Exception Handling
Generates exception handler tables.Maps try-catch blocks to code locations.
Deoptimization Support
Deoptimization Support
Creates deoptimization metadata for optimized code.Enables fallback to unoptimized code when assumptions fail.
Static Call Tracking
Static Call Tracking
Maintains table of static call targets.Used for patching and reoptimization.
Register Allocation
Register allocation (linearscan.cc) assigns registers to values using linear scan algorithm:
Location Summary
Each instruction defines its location requirements:runtime/vm/compiler/backend/locations.h):
Location::RequiresRegister(): Needs a CPU registerLocation::RequiresFpuRegister(): Needs FPU registerLocation::RegisterLocation(reg): Specific registerLocation::StackSlot(index): Stack locationLocation::Constant(value): Constant value
il.h instruction):
Instruction Emission
Each IL instruction implementsEmitNativeCode:
Architecture-Specific Code Generation
Code generation is split across architecture files:il_x64.cc: x64 (Intel/AMD 64-bit)il_arm64.cc: ARM64 (Apple Silicon, etc.)il_arm.cc: ARM32il_ia32.cc: x86 (32-bit Intel)il_riscv.cc: RISC-V
Example: Binary Smi Operation on x64
Lowering Stages
IL instructions are lowered in multiple stages:Stage 1: High-Level Lowering
Before register allocation:SelectRepresentations
SelectRepresentations
Choose value representations (boxed vs unboxed).Decisions:
- Unbox doubles for arithmetic (avoid heap allocation)
- Keep Smis unboxed when possible
- Box only when necessary for calls/stores
InsertMoveArguments
InsertMoveArguments
Insert move instructions for call arguments.Explicitly represents argument passing in IL.
Stage 2: Post-Optimization Lowering
After all optimizations:ExtractNonInternalTypedDataPayloads
ExtractNonInternalTypedDataPayloads
Lower typed data access patterns.Separates base pointer from offset calculations.
Sanitizer Instrumentation
Sanitizer Instrumentation
Add runtime checks if sanitizers enabled.Helps catch bugs during development.
Memory Model and Calling Conventions
Stack Frame Layout
Typical stack frame structure:Calling Convention
Dart uses platform-specific calling conventions defined indart_calling_conventions.cc:
x64:
- Arguments:
RDI, RSI, RDX, RCX, R8, R9, [stack] - Return:
RAX(integers),XMM0(doubles) - Preserved:
RBX, R12-R15
- Arguments:
R0-R7, [stack] - Return:
R0(integers),V0(doubles) - Preserved:
R19-R28
Code Emission Examples
Example 1: Loading a Field
Example 2: Array Element Access
Example 3: Static Call
Deoptimization Metadata
Optimized code includes deoptimization points:Deoptimization Environment
Captures program state for deoptimization:- Collect values from registers/stack per environment
- Reconstruct unoptimized frame
- Continue execution in unoptimized code
PC Descriptors
Map machine code addresses to source positions:kDeopt: Deoptimization pointkIcCall: Instance call sitekUnoptStaticCall: Unoptimized static callkReturn: Return instructionkOther: Other significant points
Optimization Examples
Example 1: Smi Fast Path
Example 2: Bounds Check Elimination
Example 3: Inlined Field Access
Architecture-Specific Optimizations
SIMD Support
Vector operations for performance:Branch Prediction Hints
Optimize for common paths:Loop Alignment
Align hot loops for better performance:Code Statistics
Track generated code metrics:- Instruction counts per type
- Code size breakdown
- Optimization effectiveness
Debugging Generated Code
IL Printing
Print IL at various stages:Disassembly
View generated machine code:Tracing
Trace compilation:Performance Considerations
Instruction Selection
- Use platform-specific instructions when available
- Prefer register operations over memory
- Minimize moves between register classes
Memory Access Patterns
- Keep hot data in cache lines
- Align frequently accessed data
- Minimize pointer chasing
Call Overhead
- Inline small functions aggressively
- Use direct calls over indirect when possible
- Specialize polymorphic calls
Further Reading
- Register allocation:
runtime/vm/compiler/backend/linearscan.cc - Architecture-specific IL:
runtime/vm/compiler/backend/il_<arch>.cc - Assembler:
runtime/vm/compiler/assembler/assembler_<arch>.cc