3 Replies Latest reply on Sep 12, 2018 10:10 AM by misterj

    Threadripper 2990WX: hot vs cold 350W

    steckdenis

      Hello everyone,

       

      I have a Threadripper 2990WX installed on an MSI X399 SLI PLUS, cooled by a Noctua NH-U14S TR4, with 32GB of G.Skill 3600MT/s CL19. The motherboard's VRMs are actively cooled by a 3300rpm 60mm fan (that happens not to be too noisy). I use the latest (A60) BIOS for the motherboard.

       

      In the BIOS, I have set "Precision Boost Overdrive" to "Manual", and set a maximum socket power of 350W, TDC of 250A and EDC of 350A. I have also applied a negative voltage offset of 0.08V, because PBO was applying too much voltage (my CPU is rock-stable at manual 3.5Ghz 1.06V, PBO was applying 1.16-1.18V). The problem I describe below also happens when no voltage offset is in use.

       

      When I run benchmarks, I observe two seemingly incoherent behaviors:

      1. MPrime: the total package consumption jumps to 350W, all the cores boost to 3.45 Ghz (1.11V). After 30 seconds, Tdie reaches 68C and the processor throttles to 550 Mhz for half a second, then continues with the benchmark. Throttling happens every 4 to 5 seconds. The VRM temps are 81C.
      2. My own Python and Numpy code, that mixes AVX and regular integer instructions, on all 64 threads: the total package consumption jumps to the same 350W, all the cores boost to 3.6Ghz (1.18V), but the temperature stabilizes at 62C. The VRM temps are 75C.

       

      Where is that energy going? Why is MPrime, consuming 350W total package power, somewhat producing more heat than my code, also consuming 350W? Because I'm running Linux, I don't have access to Ryzen Master. I use rapl-read-ryzen, that uses the power consumption MSRs of Zen. The readings seem to be what PBO uses (even if they may be incorrect), as they are properly capped at 350W, as instructed in the BIOS.

       

      Another question is why PBO allows the CPU to reach throttling temperatures? Isn't it supposed to slowly decrease the frequency when we approach 68C?

       

      Thank you for your advice

        • Re: Threadripper 2990WX: hot vs cold 350W
          steckdenis

          Hello, I investigated the issue a bit:

           

          • Throttling: I increased TDC to 300A (instead of 250A) and the thermal throttling issue disappeared. Now, PBO progressively lowers the CPU clock speed as temperature approaches 67.8C, so that the CPU never goes above 68C.
          • Power consumption: Power consumption seems to be correctly measured, but temperature acts weirdly (note: I checked my thermal paste application). If I run 32 AVX threads on CPU cores 1-16 (the two IO dies), I get a power consumption of 250W and a temperature of 67C. If I move those 32 threads to cores 17-32 (the compute dies), the power consumption and temperature do not change. When I move the 32 threads to cores 1-32 (using only even threads), I get a power consumption of 350W and a temperature of 67C too! The voltage and frequencies are lower, though.

           

          So, it seems that PBO makes sure that a "temperature" (Tdie) never exceeds 68C, but I don't know how Tdie is computed from the multiple internal temperature sensors of the dies. It seems that the temperature is (artificially or not) higher when power is concentrated to a small number of cores, instead of spread around all 32 cores.

            • Re: Threadripper 2990WX: hot vs cold 350W
              elstaci

              Someone else opened a similar thread here at AMD Forum concerning temperatures on the 2950wx : 2950x   NH-U14S TR4-SP3   Fractal Design Define R6, 37° idle temperature ? .

               

              Try configure Precision Boost and CPU Fan controller in BIOS.  ALso need to install Ryzen Master to get an accurate Temperature for your Ryzen CPU from here: AMD Ryzen™ Threadripper™ 2990WX Drivers & Support | AMD

               

              Here the AMD 2990wx Specs: Ryzen™ 2nd Gen Threadripper™ 2990WX Processor | AMD

              AMD Ryzen™ Threadripper™ 2990WX

              Specifications

               

              # of CPU Cores

               

              32

              # of Threads

               

              64

              Base Clock

               

              3GHz

              Max Boost Clock

               

              4.2GHz

              Total L1 Cache

               

              3MB

              Total L2 Cache

               

              16MB

              Total L3 Cache

               

              64MB

              Unlocked

               

              Yes

              CMOS

               

              12nm

              Package

               

              sTR4

              PCI Express Version

               

              PCIe 3.0

              Thermal Solution

               

              Not included

              Default TDP / TDP

               

              250W

              Max Temps

               

              68°C

              • Re: Threadripper 2990WX: hot vs cold 350W
                misterj

                steckdenis, I had your same questions and opened a ticket with AMD.  Here is their response:

                "

                From my research I discovered that some of the terms are from 3rd party software and some are AMD terms. These terms are described in more detail below:

                 

                 

                 

                3rd Party:

                 

                 

                 

                • CPU Temperature – The CPU temperature means the temperature measured around the CPU socket – This term is from Aida64 and further information can be found, here
                • CPU Diode Temperature – The "CPU Diode" temperature means the temperature of the CPU core – This term is from Aida 64 and further information can be found, here
                • Tdie – Tdie is the true temperature of the CPU without any offset applied – This term is from HWINFO64 and further information can be found, here

                 

                 

                AMD:

                 

                 

                 

                • Tj – Is the true Junction temperature of the CPU, which is the interface point between the die and heat spreader
                • tCtl - Is the main temperature sensor for Ryzen CPUs, tCTL value is derived from the Junction (Tj) temperature

                 

                 

                Currently, the Ryzen Processors that have a temperature offset applied to the tCTL sensor are:

                 

                 

                 

                • Ryzen 1700X/1800X - 20c
                • Ryzen 2700X - 10c
                • All Threadrippers - 27c

                 

                 

                The temperature offset approach ensures that all AMD Ryzen and Threadripper processors have a consistent fan policy.

                 

                 

                 

                Furthermore, the maximum operating temperature for these processors are (this and other specifications can be found on products.amd.com):

                 

                 

                 

                • Ryzen 1700X/1800X - 95c
                • Ryzen 2700X - 85c
                • All Threadrippers – 68c

                 

                 

                 

                Please note that the maximum operating temperature value displayed for each processor shown is the value shown from the Tj temperature sensor.  For example, your 1950X processor has a maximum operating temperature of 68c Tj.

                 

                 

                 

                I suggest using Ryzen Master Tool to monitor your Threadripper CPU.  This tool displays the true temperature of the CPU (Tj) and enables easy monitoring of temperatures as there is no offset calculation required and can be a useful way to check which temperature sensor you need to monitor when using third party applications. "

                 

                steckdenis, I am going to ask you to install the latest W10, run Ryzen Master and post a screenshot.  I cannot answer your deeper questions about PBO and suggest you open an AMD ticket and ask them.  This is a user forum, I do not work for AMD and suspect few who post here do.  If you read the thread pointed to by elstaci, you will see my comments about  his cooler.  Looks like you have a similar one and I recommend that you, too, move to a water cooler - I did several builds ago.  Enjoy, John.

                2 of 2 people found this helpful