Theoretical Speedup: Amdahl's Law is used to find the maximum expected improvement to an overall system when only part of the system is improved. It highlights that the speedup of a program using multiple processors is limited by the time needed for the sequential fraction of the program.
The Formula: The speedup is calculated as where is the proportion of the program that can be made parallel, and is the number of processors. As approaches infinity, the speedup is strictly limited to .
Diminishing Returns: This principle implies that if 10% of a task is inherently sequential, the maximum speedup possible is 10x, regardless of how many cores are added. Therefore, optimizing the sequential portion is often as critical as parallelizing the rest.
The Coherence Problem: In multicore systems, each core typically has its own private cache (L1/L2). If multiple cores cache the same memory location and one core modifies it, the other cores may be left with stale data, leading to incorrect program execution.
Snooping Protocols: One common solution is the Snooping Protocol, where all caches monitor a shared bus to track memory access. When a core writes to a memory location, it broadcasts an invalidation signal to all other caches, ensuring they update or discard their local copies.
MESI Protocol: This is a specific cache coherence protocol that tracks the state of cache lines as Modified, Exclusive, Shared, or Invalid. It ensures that a core has exclusive ownership of a data line before writing to it, maintaining consistency across the entire system.
| Feature | Multicore (SMP) | Many-core (GPU/Accelerator) |
|---|---|---|
| Core Complexity | High (Complex branch prediction, large caches) | Low (Simple pipelines, small caches) |
| Core Count | Low to Medium (2 to 64 cores) | Very High (Hundreds to Thousands) |
| Target Workload | General purpose, task parallelism | Data-heavy, massive data parallelism |
| Memory Model | Uniform Memory Access (UMA) common | Often uses high-bandwidth specialized memory |
Speedup Calculations: When calculating speedup using Amdahl's Law, always convert percentages to decimals first. Double-check if the question asks for the 'maximum theoretical speedup' (where ) or the speedup for a specific number of cores.
Identifying Bottlenecks: In conceptual questions, look for the 'serial bottleneck.' If a program must wait for a single resource (like a disk write or a single-threaded setup phase), that is the factor that will limit your parallel efficiency.
Cache Scenarios: For cache coherence questions, trace the state of a memory block through a sequence of reads and writes by different cores. Remember that a 'Write' usually invalidates other copies in a snooping-based system.
Efficiency Check: Always verify if your calculated speedup is less than or equal to the number of cores. If you calculate a speedup of 10x on an 8-core system, you have likely made a mathematical error, as 'super-linear' speedup is rare in standard exam scenarios.