cancel
Showing results for 
Search instead for 
Did you mean: 

PC Processors

steckdenis
Journeyman III

Threadripper 2990WX: hot vs cold 350W

Hello everyone,

I have a Threadripper 2990WX installed on an MSI X399 SLI PLUS, cooled by a Noctua NH-U14S TR4, with 32GB of G.Skill 3600MT/s CL19. The motherboard's VRMs are actively cooled by a 3300rpm 60mm fan (that happens not to be too noisy). I use the latest (A60) BIOS for the motherboard.

In the BIOS, I have set "Precision Boost Overdrive" to "Manual", and set a maximum socket power of 350W, TDC of 250A and EDC of 350A. I have also applied a negative voltage offset of 0.08V, because PBO was applying too much voltage (my CPU is rock-stable at manual 3.5Ghz 1.06V, PBO was applying 1.16-1.18V). The problem I describe below also happens when no voltage offset is in use.

When I run benchmarks, I observe two seemingly incoherent behaviors:

  1. MPrime: the total package consumption jumps to 350W, all the cores boost to 3.45 Ghz (1.11V). After 30 seconds, Tdie reaches 68C and the processor throttles to 550 Mhz for half a second, then continues with the benchmark. Throttling happens every 4 to 5 seconds. The VRM temps are 81C.
  2. My own Python and Numpy code, that mixes AVX and regular integer instructions, on all 64 threads: the total package consumption jumps to the same 350W, all the cores boost to 3.6Ghz (1.18V), but the temperature stabilizes at 62C. The VRM temps are 75C.

Where is that energy going? Why is MPrime, consuming 350W total package power, somewhat producing more heat than my code, also consuming 350W? Because I'm running Linux, I don't have access to Ryzen Master. I use rapl-read-ryzen​, that uses the power consumption MSRs of Zen. The readings seem to be what PBO uses (even if they may be incorrect), as they are properly capped at 350W, as instructed in the BIOS.

Another question is why PBO allows the CPU to reach throttling temperatures? Isn't it supposed to slowly decrease the frequency when we approach 68C?

Thank you for your advice

0 Likes
3 Replies
steckdenis
Journeyman III

Hello, I investigated the issue a bit:

  • Throttling: I increased TDC to 300A (instead of 250A) and the thermal throttling issue disappeared. Now, PBO progressively lowers the CPU clock speed as temperature approaches 67.8C, so that the CPU never goes above 68C.
  • Power consumption: Power consumption seems to be correctly measured, but temperature acts weirdly (note: I checked my thermal paste application). If I run 32 AVX threads on CPU cores 1-16 (the two IO dies), I get a power consumption of 250W and a temperature of 67C. If I move those 32 threads to cores 17-32 (the compute dies), the power consumption and temperature do not change. When I move the 32 threads to cores 1-32 (using only even threads), I get a power consumption of 350W and a temperature of 67C too! The voltage and frequencies are lower, though.

So, it seems that PBO makes sure that a "temperature" (Tdie) never exceeds 68C, but I don't know how Tdie is computed from the multiple internal temperature sensors of the dies. It seems that the temperature is (artificially or not) higher when power is concentrated to a small number of cores, instead of spread around all 32 cores.

0 Likes

Someone else opened a similar thread here at AMD Forum concerning temperatures on the 2950wx : 2950x  NH-U14S TR4-SP3  Fractal Design Define R6, 37° idle temperature ? .

Try configure Precision Boost and CPU Fan controller in BIOS.  ALso need to install Ryzen Master to get an accurate Temperature for your Ryzen CPU from here: AMD Ryzen™ Threadripper™ 2990WX Drivers & Support | AMD

Here the AMD 2990wx Specs: Ryzen™ 2nd Gen Threadripper™ 2990WX Processor | AMD

AMD Ryzen™ Threadripper™ 2990WX

Specifications

# of CPU Cores

32

# of Threads

64

Base Clock

3GHz

Max Boost Clock

4.2GHz

Total L1 Cache

3MB

Total L2 Cache

16MB

Total L3 Cache

64MB

Unlocked

Yes

CMOS

12nm

Package

sTR4

PCI Express Version

PCIe 3.0

Thermal Solution

Not included

Default TDP / TDP

250W

Max Temps

68°C

0 Likes

steckdenis, I had your same questions and opened a ticket with AMD.  Here is their response:

"

From my research I discovered that some of the terms are from 3rd party software and some are AMD terms. These terms are described in more detail below:

 

 

3rd Party:

 

 

  • CPU Temperature – The CPU temperature means the temperature measured around the CPU socket – This term is from Aida64 and further information can be found, here
  • CPU Diode Temperature – The "CPU Diode" temperature means the temperature of the CPU core – This term is from Aida 64 and further information can be found, here
  • Tdie – Tdie is the true temperature of the CPU without any offset applied – This term is from HWINFO64 and further information can be found, here

 

AMD:

 

 

  • Tj – Is the true Junction temperature of the CPU, which is the interface point between the die and heat spreader
  • tCtl - Is the main temperature sensor for Ryzen CPUs, tCTL value is derived from the Junction (Tj) temperature

 

Currently, the Ryzen Processors that have a temperature offset applied to the tCTL sensor are:

 

 

  • Ryzen 1700X/1800X - 20c
  • Ryzen 2700X - 10c
  • All Threadrippers - 27c

 

The temperature offset approach ensures that all AMD Ryzen and Threadripper processors have a consistent fan policy.

 

 

Furthermore, the maximum operating temperature for these processors are (this and other specifications can be found on products.amd.com):

 

 

  • Ryzen 1700X/1800X - 95c
  • Ryzen 2700X - 85c
  • All Threadrippers – 68c

 

 

Please note that the maximum operating temperature value displayed for each processor shown is the value shown from the Tj temperature sensor.  For example, your 1950X processor has a maximum operating temperature of 68c Tj.

 

 

I suggest using Ryzen Master Tool to monitor your Threadripper CPU.  This tool displays the true temperature of the CPU (Tj) and enables easy monitoring of temperatures as there is no offset calculation required and can be a useful way to check which temperature sensor you need to monitor when using third party applications. "

steckdenis, I am going to ask you to install the latest W10, run Ryzen Master and post a screenshot.  I cannot answer your deeper questions about PBO and suggest you open an AMD ticket and ask them.  This is a user forum, I do not work for AMD and suspect few who post here do.  If you read the thread pointed to by elstaci, you will see my comments about  his cooler.  Looks like you have a similar one and I recommend that you, too, move to a water cooler - I did several builds ago.  Enjoy, John.