cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

aemondis
Journeyman III

5950X Win11 maxed out PPT/EDC/TDC @ 610MHz?

First of all, apologies for the length of this. I'm hoping that a) this gives enough information to help anyone determine what is going on, and b) anyone who comes across this knows what to look at to see if their issue is the same...

As I write this, I'm working with a sluggish system that takes approximately 15 seconds just to switch tabs in my browser, and I can type a sentence before the screen updates (yes, I have a GTX 960, and yes it's old and unplayable with any games but I refuse in principle to spend $3k on a GPU). I can drag a window around my screen in circles, let go of the mouse, and watch it continue dragging the screen around for another 5 seconds. I have NEVER seen such strange behaviour until a few weeks ago, starting with some of the supposed "fixes" for Windows 11

To see that I'm not crazy, see a screenshot from Ryzen Master of a system supposedly running at 212W and sitting at 30C, whilst drawing 225A (EDC), yet peaking at 610MHz:

Ryzen MasterRyzen Master

This occurs randomly. The system has been tuned extensively with PBO and CO per-core (over MANY months, crashes, BSOD, random idle reboots - a very frustrating experience really, CO isn't nearly as "automatic" as AMD would have you believe, as vdroop instability is a massive problem, particularly at idle or lightly-loaded single thread loads), but it WAS achieving solid scores using my LianLi Galahad 360 AIO and passing CoreCycler, OCCT, Prime95, Cinebench, etc. without issue. Not my best scores, but this is my daily-driver setup:

Cinebench R23 MulticoreCinebench R23 MulticoreCinebench R23 Single coreCinebench R23 Single core

So... what happened? No clue. Nada. I can't pinpoint when it happened exactly, but it seemed to be after one of many W11 patch cycles plus several days later, and only later did I discover a new AMD chipset driver after seeing random BSOD and insane behaviour in Ryzen Master (it was reporting my SOC running at 999W). So I couldn't narrow it down to an issue with the outdated driver, W11 or some other issue. But I'm tending to lean towards a combination of the lot.

I suspect it's sociability issues between the SMU, BIOS (perhaps even an AGESA bug), W11 scheduler and the chipset driver. All I know is that the system is fine for days, or sometimes just hours, then it does this. I've had it happen under load, whilst idle, whilst gaming in old games (GTX 960, remember those?). When it hits, the ONLY fix is shutdown, then power on. A restart will not fix the issue, which to me suggests the bug is in the BIOS or AGESA... or even the CPU itself.

It seems somehow related to the boosting logic, as I haven't seen the issue when PBO is disabled, but the system is just sluggish as I do a lot of single core work too. I was using Dynamic OC (unique to this mainboard), but have it currently disabled in case it was related to the issue. It made no difference, issue still appeared.

How can I narrow it down? No idea, I know of no practical way to query the "source of truth", being the SMU. I have an Asus ROG Crosshair VIII Dark Hero, running the latest BIOS. It also has it's own Nuvoton Embedded Controller, which is confirming the data reported by the CPU is just simply, wrong (note the Avg. 7A compared to Ryzen Master 225A):

HWInfo Sensor: Asus ECHWInfo Sensor: Asus EC

The voltages on the system are normal, and the C-states show healthy behaviour meaning the CPU is shifting cores between various ACPI levels - so it's not like the system is "stuck" in any abnormal state here due to a scheduler bug:

HWInfo Sensor: CstatesHWInfo Sensor: Cstates

 

 

HWInfo Sensor: Asus NuvotonHWInfo Sensor: Asus Nuvoton

The only abnormality seems to be in ANY data reporting from the CPU itself. Note especially that the CPU cores all report nominal power usage, but the SVI2 TFN values on "Core Current" does not match with the mainboard sensors, especially given the current clocks and VID/Vcore/Temps. Note also that PROCHOT etc. are not flagged, and I have confirmed there is no mounting issue with the AIO on the core:

HWInfo Sensor: 5950 CPUHWInfo Sensor: 5950 CPUHWInfo Sensor: 5950 CPU EnhancedHWInfo Sensor: 5950 CPU Enhanced

This system is mainly a development system, which frequently hosts middleware and front-end websites in virtuals (WSL2+Docker mostly), and it had never skipped a beat. W11 introduced major improvements to WSL2, otherwise I would have not even bothered. I play occasional old games, but gave up with anything modern as playing in 1280x720 and getting 18 FPS is just not a fun experience (neither is spending $3k for a midrange GPU either!).

I'm going to do my usual power cycle, and this will be working normally again.

Has anyone got any idea at all on how to narrow down what is causing this? Or any other tests I can run the next time it occurs?

Thanks all!

0 Likes
29 Replies