Pipelining: This technique allows the CPU to work on different stages of multiple instructions simultaneously. While one instruction is being executed, the next is being decoded, and the one after that is being fetched, significantly increasing throughput.
Cache Tiering: Cache is organized into levels (L1, L2, L3). L1 is the smallest and fastest, usually integrated into each core, while L3 is larger and slower, often shared across all cores to provide a massive pool of quickly accessible data.
Parallel Execution: By utilizing multiple cores, a system can execute entirely different threads of code at once. This is most effective for 'embarrassingly parallel' tasks like video rendering or scientific simulations.
| Feature | Clock Speed | Pipelining | Multi-core |
|---|---|---|---|
| Primary Goal | Reduce cycle time | Increase throughput | Parallel execution |
| Mechanism | Faster oscillations | Overlapping stages | Multiple ALUs/CUs |
| Limitation | Heat and power | Branching/Dependencies | Task seriality |
Identify the Bottleneck: When asked why a CPU upgrade didn't improve performance, look for factors like software that isn't multi-threaded or a slow system bus that creates a data bottleneck.
Register Specificity: In questions involving the fetch-decode-execute cycle, always specify the values being moved between registers (e.g., the specific memory address in the MAR) rather than just naming the registers.
The 'Double' Trap: Never assume that doubling a resource (like cores or clock speed) results in a 100% performance increase. Always mention overhead, heat throttling, or serial constraints in your explanation.