cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

foolnotion
Adept I

Ryzen 5950X downclocking to 1.5Ghz under load (thermals are ok)

Hi everyone,

 

I am a software engineer and researcher who works with Linux. I am running my own, compute-intensive, heavily threaded, C++ code which makes use of AVX2 instructions.

 

I've recently noticed some weird behavior when running my software, namely unexplained frequency drops under load. The problem seems to manifest with hyperthreading (empirically, I would say when the number of threads > 24). Here is a graph showing the problem in the top chart, program running with 32 threads. The red line shows temperature, so we can see that the frequency goes down even if the temperature is under 60C. The other two charts show other systems where everything is fine. For these reasons I suspect that I have a hardware problem and not a software one.

 

foolnotion_0-1716322310331.png

 

 

I should mention that other software runs fine:

  • Cinebench R23 getting 27-28K (all core frequency ~4.25, temps ~72C)
  • Cinebench 2024 (running in Win11 VM) gettings ~1350 (all core frequency ~4.4, temps ~80C)

 

System specs:

  • GPU: Red Devil AMD Radeon RX 6900 XT
  • CPU: RYZEN 7 5950X 16 CORE 32 THREADS
  • Motherboard: MEG X570 UNIFY (MS-7C35)
  • BIOS Version: 7C35vAJ
  • RAM: G.Skill Trident Neo Z 64GB (2 kits of 32 Gb DDR4-3600, CL16-16-16-36, 4 x 16Gb sticks total)
  • PSU: be quiet! Dark Power Pro 12 1500W
  • Case: Lian Li O11D
  • Operating System & Version: NixOS 24.05.20240520.4cc0234 (Uakari) x86_64
  • GPU Drivers: Mesa RADV 3602.0
  • Chipset Drivers: N/A (provided by linux kernel)
  • Cooling: The system is water cooled, it was 2 x 360 radiators with noctua fans and a D5 pump (EK quantum something distro plate), TechN cold plate for the CPU

 

What I have tried, without any success:

  • multiple linux kernels, schedulers, performance governors, live distros -> the problem remains
  • multiple bios settings, PBO on/off, frequency boosting off, eco mode, default settings (with cmos reset) -> the problem persists

 

I am excluding a software implementation problem since I've tested the same software (built using the same compiler, same flags) on other systems where this issue doesn't occur.

I would really appreciate any advice and, in case there is a hardware problem, is it the CPU or the motherboard?

 

Thanks!

17 Replies
misterj
Big Boss

foolnotion, I might have a comment if I could see Ryzen Master (RM) during the speed dips, but I am fairly sure it does not run under Linux. Can you at least run RM and Cinebench under W11 and post a screenshot of RM? Is there any way you can borrow another 5950X or MB? The graphs alone would lead me to believe it was a temperature problem. I do not have a clue how Linux handles BIOS parameters like Max Operating Temperature (Tjmax) but if it is not set correctly in the BIOS this is the symptom I would expect to observe. It is specified as 90C. Thanks and enjoy, John. 

0 Likes
SerchTech
Adept II

Hello foolnotion
 
The only advice I can offer you is to open 2 support tickets simultaneously with AMD and MSI. From my experience here and being a user-to-user help forum, it is very difficult for someone with the same CPU and MB to help you by reproducing your work scenario, it would be almost a miracle and AMD employees do not usually respond here.
 
Hope I am wrong and it will be a pleasure to read your discoveries.
 
Good luck!
0 Likes
foolnotion
Adept I

I was able to reproduce the behavior on a similar system. My best guess so far is that the observed behavior is a consequence of contention due to hyper-threading and memory/cache access patterns under heavily vectorized/SIMD workloads. Here are the two systems:

foolnotion_0-1716411714275.png

I still find it extremely weird that the frequency goes down so far, but at least I know that its not a hardware fault. It has to be a combination of specific workload and architecture-specific ryzen 5950X limitations, since the 3950X CPU I tested is not affected.

Will keep posting updates if i discover anything new.

foolnotion, I run my 3970X with NUMA mode enabled with a minor increase in performance. You might try this. Enjoy, John.

0 Likes

Thanks for the advice, but I tried every possible NUMA mode in the bios and the problem reoccurs every time.

0 Likes

foolnotion, if it is possible to reproduce in Windows, then please do and post a screenshot of Ryzen Master (RM). Thanks, John.

0 Likes

Hi,

 

I was able to run in Windows/WSL, taking screenshots of Ryzen Master, the frequency still goes down (my app running in the backgroung on 32 threads): https://imgur.com/a/pVNJWg8

 

Thanks,

Bogdan

0 Likes

Thanks, Bogdan. I will spend some time understanding the images. At fist glance your Ryzen is throttling due to several limits imposed by the BIOS (red and yellow meters). I will return shortly. Thanks, John.

0 Likes

This is a friend's computer, but the original issue occurs on mine where the limits are much higher (right now I use PPT/EDC/TDC 250/140/170) but I also used motherboard limits, it didn't change anything except that motherboard limits cause more heat.

0 Likes

Bogdan, please explain the multiple RM images. Thanks, John.

0 Likes

Sorry, I should have explained. The images are taken in sequence, with the program running in the background (first image at the top, last image at the bottom). We can see that we start with 3.7Ghz and then downclocking to ~2.5Ghz. I took multiple screenshots at intervals of a few seconds apart (see clock in the bottom right for a timestamp) in order to capture whatever pattern occurs in there (if there is any discernible pattern at all).

0 Likes

Thanks, Bogdan. I have not been able to explain what is happening. Please open a support request with AMD here. Here is the site to request to RMA your processor. I did notice you are not using a Profile on your memory. I also use G.Skill memory and use an XMP profile to run it faster. Thanks and enjoy, John.

EDIT: Since you are running on Windows, please try increasing the PPT/EDC/TDC limits using RM. Maybe it has a secret we do not know.

0 Likes


@misterj wrote:

Thanks, Bogdan. I have not been able to explain what is happening. Please open a support request with AMD here. Here is the site to request to RMA your processor. I did notice you are not using a Profile on your memory. I also use G.Skill memory and use an XMP profile to run it faster. Thanks and enjoy, John.

EDIT: Since you are running on Windows, please try increasing the PPT/EDC/TDC limits using RM. Maybe it has a secret we do not know.


@Ray_AMD 

 

Why is @misterj allowed, knowing in advance that he cannot help OP, to request an OS change due to an absurd obsession about Ryzen Master screenshots? IMO should be limits, the lack of experience in Support Forums of Newcomer's should not be taken advantage of for personal purposes or simple entertainment.

 

@foolnotionand future Linux users looking for help here:

 

Installing Windows to compare how your hardware reacts versus Linux is always interesting but real help will NEVER ask you to abandon your host OS, if this happens it is most likely that the other user will not be able to offer you a solution, therefore it is your decision to waste your time ... or not. Good luck!

0 Likes

Rest assured I have not abandoned my OS. I have not installed windows on my machine, instead I asked a friend to help me test the issue on his machine which has a similar configuration and the same CPU. It was not so unreasonable to verify the output of ryzen master (since in any case it does offer more info than linux sensors output).

 

Based on this test, I can conclude that the problem occurs on both machines, mine (linux), my friend's (linux booted from a live iso) and also my friend's windows (running the software in WSL).

 

Given that I can reproduce this on another machine, I don't think it's a hardware problem with my CPU (and even if it was, I bought my 5950X CPU at the end of 2020, so its out of warranty, no point even trying an RMA).

 

I guess the next step is to make an official request for technical assistance to AMD. Thanks everyone for your help.

0 Likes
foolnotion
Adept I

@Volanaththis is not the right place for your question, please make another top-level forum post with your issue.

Okay i understand.
But i don't know how can i use label when I post something.
Please tell me please help me.

0 Likes

Volanath, LABEL?? Please open a new issue by clicking "Start a new Discussion". Post a screenshot of Ryzen Master (RM). Thanks, John.

0 Likes