CPU execution cycle
The CPU operates through a fundamental four-stage cycle that processes every instruction:Fetch
The CPU retrieves the next instruction from memory using the program counter (PC). The instruction is loaded from the memory address pointed to by the PC into the instruction register.
Decode
The control unit interprets the instruction, determining what operation needs to be performed and which registers or memory locations are involved.
Execute
The arithmetic logic unit (ALU) or other functional units perform the actual operation specified by the instruction, such as arithmetic, logic, or memory operations.
Modern CPUs can execute multiple instructions simultaneously through pipelining, where different stages of different instructions overlap in time.
Registers and instructions
Registers are small, high-speed storage locations built directly into the CPU. They form the fastest level of the memory hierarchy.Register usage
Registers serve different purposes in instruction execution:- General-purpose registers - Store operands and results of arithmetic/logic operations
- Program counter (PC) - Holds the address of the next instruction to execute
- Stack pointer (SP) - Points to the top of the current stack frame
- Instruction register (IR) - Holds the current instruction being executed
- Status/flags register - Contains condition codes (zero, carry, overflow, etc.)
Instruction scheduling
Simple instruction scheduling optimizes CPU utilization by:- Reordering instructions to avoid data dependencies
- Interleaving independent operations to maximize throughput
- Minimizing pipeline stalls by scheduling instructions strategically
- Unoptimized
- Optimized
Branching and pipelines
Branching introduces complexity in pipelined CPU architectures because the next instruction to fetch depends on the branch outcome.Branch prediction
Modern CPUs use sophisticated prediction algorithms to guess which way a branch will go:- Static prediction - Always predict taken or not taken
- Dynamic prediction - Use branch history to make predictions
- Two-level adaptive prediction - Track patterns of branch behavior
Avoiding pipeline stalls
Strategies to minimize pipeline stalls:Branch delay slots
Place independent instructions immediately after branches to fill execution gaps
Predication
Convert branches to conditional execution, allowing both paths to be computed
Loop unrolling
Reduce branch frequency by executing multiple loop iterations per cycle
Speculative execution
Execute instructions from predicted path while waiting for branch resolution
Performance implications
Understanding CPU architecture enables low-level optimization:- Minimize data dependencies between consecutive instructions
- Write branch-predictable code with consistent patterns
- Leverage instruction-level parallelism through independent operations
- Avoid unnecessary memory operations by maximizing register usage