In August, AMD announced a new 32-core and 16-core Threadripper, revamping its top-end product line and further distancing the performance gaps between itself and Intel. Now the company is extending that product family with new 24-core and 12-core products, along with a new memory access mode that should reduce some of the performance penalties high-core Threadripper CPUs have faced. These launches were previously expected, so the parts dropping in October puts them right on schedule.
First, let’s hit the new model numbers. The 2970WX is a new part number without an equivalent in first-gen Threadripper — a 24-core CPU with a 3GHz base clock, 4.2GHz boost, the same 250W TDP as the 2990WX, and a $1299 list price. The 2920WX replaces the 1920X (which had dropped to $400 earlier this year in a likely price-clearing gesture) and will once again run $650 for a 12-core CPU with a 3.5GHz base clock, 4.3GHz boost, and 180W TDP. The chart below shows how AMD and Intel’s stacks compare to each other, with red for AMD and blue for Intel.
If you’re building a powerful workstation and care about multi-threaded performance, it’s genuinely difficult to recommend Intel’s HEDT lineup. The core count disparities are high, and while the NUMA (Non-Uniform Memory Access) implementation AMD uses for the 2990WX can hit its performance in some applications, AMD has come up with a method for at least partly ameliorating that issue: Dynamic Local Mode. When AMD launched Ryzen, it allowed for users to switch between two different modes of accessing memory. Local Mode tuned applications to run on cores that preferred data remain local to the CPU (lowering memory latency at the cost of bandwidth) or that preferred memory bandwidth (and ran at higher memory latencies).
Overall, the two modes tended to wind up in the same place on average. But that doesn’t mean there were no application-level differences between the two. It just means that if you benchmarked a large enough suite of tests, you ended up with the two impacts more-or-less canceling each other out. Switching back and forth between them required a reboot and we suspect most users rarely bother.
Now, AMD has introduced the ability to shift back and forth between these modes withoutrebooting. According to AMD, Dynamic Local Mode automatically migrates demanding software threads running on the 2990WX or 2970WX to the cores with the fastest memory accesses, while threads that can handle running at increased latency will be pushed to cores with indirect memory access.
Here’s how AMD describes its capability implementation:
Dynamic Local Mode is implemented as a Windows 10 background service that measures how much CPU time each thread on the system is consuming. These threads are then ranked from most to least demanding, and the top threads are automatically pushed to the CPU cores that contain direct memory access. Once these cores are consumed by work, additional threads are scheduled and executed on the next available CPU core. This process is continuous while the service is running, ensuring the most demanding threads always get preferential time on cores with local memory. (As a corollary, insignificant threads are pushed to other dies.)
Presumably, DLM will improve performance the most in applications where the number of threads that need prompt low-latency access is small enough to fit effectively on the number of cores with low latency access. In a situation where the memory subsystem is heavily taxed by many threads competing aggressively for memory resources, the NUMA implementation AMD uses for Threadripper could still cause some threads to be isolated from direct memory access. That’s an unavoidable architectural consequence of the CPU’s design — dealing with NUMA is always a headache.
Still, these performance enhancements should boost the 2970WX and 2990WX in more lightly threaded applications, and AMD’s 10-20 percent performance uplifts with one 47 percent outlier. All in all, the gains here look quite good.