cancel
Showing results for 
Search instead for 
Did you mean: 

Previewing Dynamic Local Mode for the AMD Ryzen™ Threadripper WX Series Processors

Staff
Staff
5 12 86.1K

Not long ago, AMD unveiled the AMD Ryzen™ Threadripper WX Series processors with record-setting performance for serious content creation applications that generally scale across many CPU cores. The AMD Ryzen™ Threadripper 2970WX and 2990WX achieved such breakneck performance with 24 or 32 cores spread across four processor dies: two with direct access to local memory, and two with access to memory via the Infinity Fabric. This smart design enabled backwards compatibility with existing AMD X399-based motherboards, too!

pastedImage_0.png

The AMD Ryzen Threadripper 2970WX and 2990WX have four dies containing 6 or 8 cores each, respectively. Two of the dies have direct memory access (blue), and two access memory over the Infinity Fabric (red).

What about apps that weren’t designed to be so scalable? There are instances where the entire application (“process”), or specific workloads spawned by that process (“threads”), can achieve the best performance when they’re executed on the two CPU dies with local/direct memory access. We’ve been working hard to extend a helping hand to these applications, and we looked to our past for inspiration.

Thinking back to the 1st Generation AMD Ryzen™ Threadripper processor, AMD Ryzen™ Master was updated to add a toggle for Local Mode or Distributed Mode. These modes tuned the performance of applications that preferred lower memory latency or higher memory bandwidth, respectively. This capability required a system reboot but, according to reviewers like TechSpot, there was a clear performance upside when an application was paired with its most favored mode.

With the “favored modes” in mind, that brings us to today. What if Ryzen™ Threadripper WX Series CPUs could have some sort of “favored mode” to ensure the best performance for both heavy and lightly-threaded apps? What if it could be switched on the fly without a reboot? All of this is possible with a new feature we’re calling Dynamic Local Mode.

What is Dynamic Local Mode?

Dynamic Local Mode is a new piece of software that automatically migrates the system’s most demanding application threads onto the Threadripper™ 2990WX and 2970WX CPU cores with local memory access. In other words: the apps that prefer local DRAM access will automatically receive it, and apps that scale to many cores will be free to do so.

What is the Benefit of Dynamic Local Mode?

In the applications we have tested to date, AMD has observed performance improvements of up to 47% with Dynamic Local Mode enabled.1 The below diagram shows a variety of games and applications aided by the new feature, and AMD expects other applications that we have not yet analyzed may also benefit. But we also want to be clear about the fact that not every application will see a benefit, as not every application demonstrates the threading behaviors that Dynamic Local Mode is designed to assist. Even so, it's clear that some processes really take a liking to Dynamic Local Mode and it's quite satisfying to see such a speedup from a new and free feature for your platform.

Untitled.png

See footnotes at the end of this blog for system configuration and raw data. Please note that your results may vary with system configuration and drivers.

How is Dynamic Local Mode implemented?

Dynamic Local Mode is implemented as a Windows® 10 background service that measures how much CPU time each thread on the system is consuming. These threads are then ranked from most to least demanding, and the top threads are automatically pushed to the CPU cores that contain direct memory access. Once these cores are consumed by work, additional threads are scheduled and executed on the next available CPU core. This process is continuous while the service is running, ensuring the most demanding threads always get preferential time on cores with local memory. (As a corollary, insignificant threads are pushed to other dies.)

Untitled.png

How is Dynamic Local Mode different from Local Mode?

A bit of background is required to answer this question. For AMD Ryzen™ Threadripper™ X Series CPUs, every processor die has directly-connected memory. Local Mode and Distributed Mode change how the operating system sees these CPUs:

  • In Local Mode, the OS sees two partitions called “NUMA nodes,” each with one die’s worth of CPU cores and RAM. Local Mode sends hints to the OS that threads and their memory contents should be kept within the same node (if possible) to minimize memory latency.
  • In Distributed Mode, the OS sees a single large pool (“UMA node”) with all available dies and memory grouped together.

But, in a system where not every die has direct memory access, the system must necessarily be configured with four NUMA nodes: two with CPU cores plus local RAM, two with CPU cores and no local RAM. Threads will always fill the nodes with local memory first, but this is a first-come, first-served affair in Windows® that sometimes results in threads being executed remotely from their memory footprint.

In such a system, some other mechanism is needed to preferentially execute threads on cores with local memory. Dynamic Local Mode is spiritually like Local Mode in that it also endeavors to keep threads and their memory contents together. However, unlike traditional Local Mode, Dynamic Local Mode:

  1. Operates on-the-fly without a reboot to toggle between modes
  2. Ensures that demanding threads are executed on dies with local memory
  3. Does not fundamentally change how the operating system sees the processor’s resources

What if I want to disable Dynamic Local Mode?

Dynamic Local Mode is configured as a Windows service. You may simply stop and disable the service to prevent Dynamic Local Mode from running, or you can toggle the feature on and off within AMD Ryzen™ Master.

What processors is Dynamic Local Mode for?

Just to be clear, Dynamic Local Mode is a new feature for the AMD Ryzen™ Threadripper™ 2990WX and 2970WX processors. Only these AMD Ryzen™ Threadripper™ processors have a mixed memory access design wherein some dies have direct memory access, while others access memory across the Infinity Fabric.

Dynamic Local Mode available starting October 29th

Beginning October 29th, Dynamic Local Mode will be a new package included with the latest version of AMD Ryzen™ Master. Downloading AMD Ryzen™ Master on or after the afternoon of 10/29 will automatically configure Dynamic Local Mode on your system if it contains an AMD Ryzen™ Threadripper 2990WX or 2970WX processor (also available starting 10/29). Looking further ahead, AMD also plans to open the feature up to even more users by including Dynamic Local Mode as a default package in the AMD Chipset Drivers.

Let the countdown begin! We’re looking forward to your feedback.


Robert Hallock is a technical marketing guy for AMD's CPU division. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

Footnotes:

1. Testing by AMD Performance Labs as of 10/4/2018. Results presented in order of Dynamic Local Mode OFF vs. ON (% difference). All games tested at 1920x1080 with the graphics API and in-game graphics preset noted. Far Cry 5 (DirectX 11/Ultra): 48 FPS vs. 53 FPS (10% faster); PUBG (DirectX 11/Ultra): 99 FPS vs. 111 FPS (12% faster); Battlefield 1 (DirectX 12/Ultra): 136 FPS vs. 200 FPS (47% faster); Alien: Isolation (DirectX® 11/Ultra): 199 FPS vs. 234 FPS (18% faster); Unreal Engine Compile Time: 954 seconds vs. 810 seconds (15% faster); SPECwpc® V2.1 Rodinia euler3d_cpu: 4.25 vs. 3.36 (21% faster). Average of results less Battlefield 1 outlier: 15.2% faster. System configuration: AMD Ryzen Threadripper Reference Motherboard, AMD Ryzen Threadripper 2990WX, 4x8GB DDR4-3200, GeForce GTX 1080 (driver 399.24), Samsung 850 Pro SSD, Windows 10 Pro x64 (RS4). Results may vary with drivers and system configuration. SPECwpc® V2.1 is the latest version of SPECwpc® as of 9 October, 2018. Additional information about the SPEC benchmarks can be found at www.spec.org/gwpg. RP2-36

12 Comments
Adept I
Adept I

Thats great !!

I like how you never settle and further push the positives of this special product and minimize its downsides.

please keep the pace

Journeyman III
Journeyman III

Are linux software/drivers planned ? I assume these are Windows only ?

Just so you all know, I own a 2990WX and  It runs 100% load on all cores 24/7/365 doing Rosetta@home.

Journeyman III
Journeyman III

That's amazing.

However would this work on Dual channel system or only quad channel system?

Reason is that many would love to buy a TR4 CPU (especially given the 19xx series pricing) but they are put off because of the quad channel ram cost these days.

If this solution resolves also that problem, then I believe AMD needs to advertise it as it opens up a bigger market that could upgrade to TR4 CPUs. 

Adept I
Adept I

Excellent news! At my engineering firm we are building a workstation based on the 2990WX that will be used to run highly multi-threaded calculations as well as VR. The suitability of the 2990WX was a concern, but this should solve any issues. Paired with the new Radeon Pro WX 8200, CAD and VR will be amazing. Very exciting times, for $5k we will have the performance of supercomputer that would have cost hundreds of thousands of dollars a mere decade ago!

Adept I
Adept I

rhallock

How does windows handle the locality in Local Mode using a sub-24-core Threadripper ?

Is it bad in the same way as you described it in the org. post ?

from the org. post:

"Threads will always fill the nodes with local memory first, but this is a first-come, first-served affair in Windows® that sometimes results in threads being executed remotely from their memory footprint."

thanks for reading/replying

Journeyman III
Journeyman III

Are we there yet? Are we there yet? Are we there yet?

Eagerly awaiting, 9 minutes after 12:00pm EST

Adept I
Adept I

Has this been delayed?

I just downloaded the latest version of Ryzen Master and reinstalled, but no sign of Dynamic Local Mode.

Adept I
Adept I

Seems the problem is the download drivers section...

Anyone coming to this page and has the same problem as me, then don't use the following page for the Ryzen Master Download

https://www.amd.com/en/support/cpu/amd-ryzen-processors/amd-ryzen-threadripper-processors/amd-ryzen-...

That one links to AMD-Ryzen-Master-UI.exe

Use this one instead...

https://download.amd.com/Desktop/AMD-Ryzen-Master.exe

Adept II
Adept II

As of right now this software does not work if you use any form of virtualization software in Windows. I run 12 systems in Hyper-V as a constant need and this software after installation will prompt you that it cannot run with Virtualization Based Security (VBS) enabled. Hyper-V relies on VBS.

Seems AMD need to get with Microsoft and come up with a solution, but for the time being this 2990wx dynamic local mode software release is a total bust if you are any kind of a power/workstation user.

Come on AMD get this fixed!

Journeyman III
Journeyman III

Since AMD's Dynamic Local Mode is implemented as a stand-alone service, you can disable VBS, install Ryzen Master, then re-enable VBS. You won't have the UX of Ryzen Master, but the Dynamic Local Mode service will continue to function. Once AMD adds it to the chipset drivers, this won't be an issue anymore.

I have also developed my own implementation of Dynamic Local Mode. See Coreprio – Bitsum . This works as well as AMD's implementation, and provides more options. AMD's settings are limited to a simple ON/OFF toggle.

Journeyman III
Journeyman III

Linux's better handling of the esoteric NUMA configuration of the 2990wx makes it so that this solution is not necessary. It will prefer the cores with direct access to memory. Windows chokes when confronted with NUMA nodes that have *zero* local memory, hence the need for this.

Note that when *all* CPUs are being utilized, it doesn't matter as much. When less than all CPUs are being utilized, it makes sense to use the more efficient ones with direct access to memory channels first. When all CPUs are used, there is not much to be done.

In your case, with a sustained idle load from Rosetta@home, I am not sure Dynamic Local Mode help at all since it doesn't have any per-process tuning that would let you indicate Rosetta's threads should not be prioritized to the more efficient CPUs. This is something (additional customization) that I will likely add to my own proprietary implementation of AMD's Dynamic Local Mode.

Adept I
Adept I

Does anyone know what happened to Dynamic Local Mode in Version 2.0.1.1223 of Ryzen Master, I updated Windows 10 to 1903 and updated Ryzen master which now says Dynamic Local Mode it is not applicable. Has it been built into the new windows update and not needed anymore?