Not long ago, AMD unveiled the AMD Ryzen™ Threadripper WX Series processors with record-setting performance for serious content creation applications that generally scale across many CPU cores. The AMD Ryzen™ Threadripper 2970WX and 2990WX achieved such breakneck performance with 24 or 32 cores spread across four processor dies: two with direct access to local memory, and two with access to memory via the Infinity Fabric. This smart design enabled backwards compatibility with existing AMD X399-based motherboards, too!
The AMD Ryzen Threadripper 2970WX and 2990WX have four dies containing 6 or 8 cores each, respectively. Two of the dies have direct memory access (blue), and two access memory over the Infinity Fabric (red).
What about apps that weren’t designed to be so scalable? There are instances where the entire application (“process”), or specific workloads spawned by that process (“threads”), can achieve the best performance when they’re executed on the two CPU dies with local/direct memory access. We’ve been working hard to extend a helping hand to these applications, and we looked to our past for inspiration.
Thinking back to the 1st Generation AMD Ryzen™ Threadripper processor, AMD Ryzen™ Master was updated to add a toggle for Local Mode or Distributed Mode. These modes tuned the performance of applications that preferred lower memory latency or higher memory bandwidth, respectively. This capability required a system reboot but, according to reviewers like TechSpot, there was a clear performance upside when an application was paired with its most favored mode.
With the “favored modes” in mind, that brings us to today. What if Ryzen™ Threadripper WX Series CPUs could have some sort of “favored mode” to ensure the best performance for both heavy and lightly-threaded apps? What if it could be switched on the fly without a reboot? All of this is possible with a new feature we’re calling Dynamic Local Mode.
Dynamic Local Mode is a new piece of software that automatically migrates the system’s most demanding application threads onto the Threadripper™ 2990WX and 2970WX CPU cores with local memory access. In other words: the apps that prefer local DRAM access will automatically receive it, and apps that scale to many cores will be free to do so.
In the applications we have tested to date, AMD has observed performance improvements of up to 47% with Dynamic Local Mode enabled.1 The below diagram shows a variety of games and applications aided by the new feature, and AMD expects other applications that we have not yet analyzed may also benefit. But we also want to be clear about the fact that not every application will see a benefit, as not every application demonstrates the threading behaviors that Dynamic Local Mode is designed to assist. Even so, it's clear that some processes really take a liking to Dynamic Local Mode and it's quite satisfying to see such a speedup from a new and free feature for your platform.
See footnotes at the end of this blog for system configuration and raw data. Please note that your results may vary with system configuration and drivers.
Dynamic Local Mode is implemented as a Windows® 10 background service that measures how much CPU time each thread on the system is consuming. These threads are then ranked from most to least demanding, and the top threads are automatically pushed to the CPU cores that contain direct memory access. Once these cores are consumed by work, additional threads are scheduled and executed on the next available CPU core. This process is continuous while the service is running, ensuring the most demanding threads always get preferential time on cores with local memory. (As a corollary, insignificant threads are pushed to other dies.)
A bit of background is required to answer this question. For AMD Ryzen™ Threadripper™ X Series CPUs, every processor die has directly-connected memory. Local Mode and Distributed Mode change how the operating system sees these CPUs:
But, in a system where not every die has direct memory access, the system must necessarily be configured with four NUMA nodes: two with CPU cores plus local RAM, two with CPU cores and no local RAM. Threads will always fill the nodes with local memory first, but this is a first-come, first-served affair in Windows® that sometimes results in threads being executed remotely from their memory footprint.
In such a system, some other mechanism is needed to preferentially execute threads on cores with local memory. Dynamic Local Mode is spiritually like Local Mode in that it also endeavors to keep threads and their memory contents together. However, unlike traditional Local Mode, Dynamic Local Mode:
Dynamic Local Mode is configured as a Windows service. You may simply stop and disable the service to prevent Dynamic Local Mode from running, or you can toggle the feature on and off within AMD Ryzen™ Master.
Just to be clear, Dynamic Local Mode is a new feature for the AMD Ryzen™ Threadripper™ 2990WX and 2970WX processors. Only these AMD Ryzen™ Threadripper™ processors have a mixed memory access design wherein some dies have direct memory access, while others access memory across the Infinity Fabric.
Beginning October 29th, Dynamic Local Mode will be a new package included with the latest version of AMD Ryzen™ Master. Downloading AMD Ryzen™ Master on or after the afternoon of 10/29 will automatically configure Dynamic Local Mode on your system if it contains an AMD Ryzen™ Threadripper 2990WX or 2970WX processor (also available starting 10/29). Looking further ahead, AMD also plans to open the feature up to even more users by including Dynamic Local Mode as a default package in the AMD Chipset Drivers.
Let the countdown begin! We’re looking forward to your feedback.
Robert Hallock is a technical marketing guy for AMD's CPU division. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.
1. Testing by AMD Performance Labs as of 10/4/2018. Results presented in order of Dynamic Local Mode OFF vs. ON (% difference). All games tested at 1920x1080 with the graphics API and in-game graphics preset noted. Far Cry 5 (DirectX 11/Ultra): 48 FPS vs. 53 FPS (10% faster); PUBG (DirectX 11/Ultra): 99 FPS vs. 111 FPS (12% faster); Battlefield 1 (DirectX 12/Ultra): 136 FPS vs. 200 FPS (47% faster); Alien: Isolation (DirectX® 11/Ultra): 199 FPS vs. 234 FPS (18% faster); Unreal Engine Compile Time: 954 seconds vs. 810 seconds (15% faster); SPECwpc® V2.1 Rodinia euler3d_cpu: 4.25 vs. 3.36 (21% faster). Average of results less Battlefield 1 outlier: 15.2% faster. System configuration: AMD Ryzen Threadripper Reference Motherboard, AMD Ryzen Threadripper 2990WX, 4x8GB DDR4-3200, GeForce GTX 1080 (driver 399.24), Samsung 850 Pro SSD, Windows 10 Pro x64 (RS4). Results may vary with drivers and system configuration. SPECwpc® V2.1 is the latest version of SPECwpc® as of 9 October, 2018. Additional information about the SPEC benchmarks can be found at www.spec.org/gwpg. RP2-36