1 Reply Latest reply on Oct 31, 2016 1:11 PM by black_zion

    Using hardware queues to break the multi-core CPU bottleneck

    kingfish

      With silicon clock scaling largely dead thanks to the laws of physics, computer scientists and chip designers have had to search for performance improvements in other areas — typically by improving architectural efficiency and reducing power consumption. Multi-core scaling may have paid big dividends with dual and quad-core chips, but largely plateaued thereafter. There are a number of reasons why adding new processor cores yields diminishing marginal returns. One of the critical causes is the steep overhead associated with maintaining cache coherency between cores that share data sets. Now, a team of researchers working with Intel think they may have found a solution. If their work proves useful, it could offer a significant performance boost in certain applications.

      Before we discuss the solution, we need to spend a bit of time talking about the problem. Imagine two separate CPU cores, each of which is working on part of a common computation. Each CPU will have its own L2 cache, where data related to the problem is stored. In a coherent cache, CPU 0 completes part of its calculations, writes a new value to a block of memory, and then communicates that it has done so. CPU 1 now knows that its own data is out-of-sync with CPU 1 and can update its own L2 cache accordingly. There are several methods of implementing coherence, but at the simplest level, it’s a method for ensuring that all of the CPUs are “on the same page,” as it were. Cache coherence is essential to multi-core scaling, but it also represents a substantial bottleneck as core counts increase. The more CPUs in a system, the more CPU time must be spent enforcing whatever coherence strategy has been chosen, and the less bandwidth is available for actually solving the compute problem in question.

      Using hardware queues to break the multi-core CPU bottleneck - ExtremeTech