Throughput vs. Latency: Pipelining aims to increase throughput (the number of instructions completed per unit of time) rather than decreasing latency (the time it takes for a single instruction to finish).
Clock Cycle Time: The clock period of a pipelined processor is determined by the duration of the slowest stage plus the overhead of the pipeline registers (latches) that store data between stages.
Ideal Speedup: In an ideal scenario with stages and instructions, the speedup over a non-pipelined processor is approximately . The formula for the time taken to execute instructions in a -stage pipeline is:
Balanced Stages: Maximum efficiency is achieved when all stages take exactly the same amount of time; otherwise, faster stages must wait for the slowest one, creating idle time.
Stalling (Bubbles): The simplest way to resolve hazards is to pause the pipeline for one or more cycles, effectively inserting a 'no-operation' (NOP) instruction until the dependency is resolved.
Forwarding (Bypassing): This technique routes the result of an operation directly from the output of the execution unit to the input of a subsequent instruction, bypassing the need to wait for a write-back to a register.
Branch Prediction: Processors use logic to guess whether a branch will be taken or not. If the guess is correct, the pipeline continues without interruption; if incorrect, the pipeline is flushed.
Delayed Branching: A compiler-level technique where the instruction immediately following a branch is always executed, regardless of the branch outcome, to keep the pipeline full.
| Feature | Non-Pipelined | Pipelined |
|---|---|---|
| Execution | One instruction at a time | Multiple instructions overlapped |
| Throughput | Lower (1 instruction per cycles) | Higher (Ideally 1 instruction per cycle) |
| Complexity | Simpler hardware design | Complex control logic and hazard detection |
| Resource Usage | Low (most units idle most of the time) | High (most units active every cycle) |
Calculate Speedup: Always check if the question provides the number of instructions () and stages (). Use the formula for finite instructions, or simply for a very large .
Identify Hazards: Look for instructions where the destination register of one is the source register of the next; this is a classic data hazard that requires stalling or forwarding.
Branch Penalties: If a branch occurs at stage 3 of a 5-stage pipeline, remember that 2 instructions (already in stages 1 and 2) will likely need to be flushed if the branch is taken.
Sanity Check: A pipelined processor should never be slower than a non-pipelined one in terms of throughput, but it will never achieve a speedup greater than the number of stages.