Back to the point I am trying to bring to the surface that no one seems to be discussing:
In MY Opinion & from Research I have gathered the following to be true -
1.) WHEA ID 18 is secondary to the KERNEL-Power ID 41 (63)
2.) KERNEL-Power error comes 10-20 seconds before the WHEA; a longtime issue not specific to AMD and 5000 Series; but for some reason Windows/BIOS/MoBo's are extremely sensitive to the issue as if the PC was a laptop on battery trying to do too much & cannot draw the power it needs to stay alive.
3.) I have received KERNEL-power errors that do not always log a matching WHEA; over last 30 days ratio is 70%
4.) Issues with this go back to the beginning days of the BIOS PCI/Bus circa 2006 - as found from 10+ year old Microsoft forum discussions. (see my other posts for links)
5.) The issue has plagued the AMD & Windows OS for 10 plus years - same Microsoft forum discussions, which have interesting details around 2011 regarding old Java scripts from early years (see my other posts for links)
5.) Power saving "features" in Windows seem to be a culprit but are not the final solution is the AMD chip is highly active
6.) Multi-core (IMO - 8 & over, per reading forums) are the most susceptible, 5800X, 5900X, 5950X, even though I could re-create the same errors on a different ASUS board & my 3900X
7.) On top of #6, the latest Ryzen 5000 series are triggering this scenario once Tweaking happens; from the use of my 3900X to my current 5900X
8.) Running the 5900X closer to 5.1MHz on my ASUS B550-E seems to be my cut off, not because of thermals or power consumption, but the quick milli-second boosts to that speed and over when at idle, even though it would run for several hours at a time, with PBO2 Boost spikes to 5150MHz on my preferred cores. Could this be VRMs, I don't thing so, 14+2 power stages have been solid
9.) Setting custom voltages and testing custom PBO2 Boost Curves will yield the best results in ASUS BIOS
10.) Quality power supply, name brand 80+ Gold and Up, depending on MoBo/Ryzen 5000 series CPU Boost Goals (Everyday vs. Tweaker)/GPU power requirements; I am using an EVGA G3 1000W to achieve my current goals; Once I turn up the 6900 XT I think I will be short or less stable at full-tilt (My Red Devil RX 6900 XT has pulled into the 400 Watt Range (Wattman+MPT) with it's 3 power plugs, PCI=75W, Plugs x3=450W, Total Hardware=525W, Manufacturer states 480W draw capable (hopefully coming with Core voltage increase) and a 900W PSU preferred! Last AMD Driver update let some guys go over 2700MHz stable on 6800/6900XTs); No power left over for Boost spikes, then errors will come for sure.
11.) Temperature - Room Ambient Temp/MoBo Temps/CPU Tdie Temp/GPU Hot Spot Temp all must stay way below their thermal limits for performance; a lot of the errors I read and then read their temps, well that'll do it...
While I can sense everyone's frustration, I think this specific scenario is not what it appears regarding WHEA errors. IMO from testing the combinations of hardware & settings, it is related to CPU Clocks in relation to Single or Dual Channel RAM &/or Processors plus Boost Setting (be it auto settings when on (AMD Issue or activates an issue described above) or custom settings being out of the capabilities of the board/CPU/WinOS combination.) The MoBo manufactuers have given us all the tools we need to show the capabilities of these impressive processors but the other side of that coin is we can quickly create a scenario of Red Team Sucks. When in fact, it is because of the scenario you have created in your system.
Again, my opinion, my experience, I hope it helps you to create tests & log your data to come up with what works best, and ultimately find your ryZEN, in your AMD system.
I do understand that you specifically were using precision boost overdrive, I merely sought to clarify the statement you made.
"These new processors will pull over their 105W TDP & 95A TDC base without hesitation due to water cooling."
" therefore I never insinuated what you are claiming I did."
Just based on the simply meaning of words, let's break that down.
The statement implies that all Ryzen 5000 processors (These processors) will pull over their designated TDP as long as water cooling is applied (due to water cooling), which is categorically false. So no, you didn't insinuate it, you flat out said it.
No Ryzen processor will boost past stock settings, water or not without PBO enabled. Furthermore, the statement also implies that Ryzen will not boost past stock operation if water cooling is not applied. Again, that is false. They will in fact pull over the TDP with air cooling as well as long as PBO is turned on, as most higher end air coolers will keep a chip cool past 142W PPT.
So as a general statement "These new processors will pull over their 105W TDP & 95A TDC base without hesitation with precision boost overdrive and adequate cooling" would be better.
Since you were actually referring to your own situation in particular, I would have gone with
"My new processor will pull over it's 105W TDP & 95A TDC base without hesitation due to water cooling."
That makes it very clear that you are referring to your situation and not making a generalized statement about all Ryzen processors.
Back to the matter at hand. Just turning on Precision Boost Overdrive by itself and not manually setting the PPT/TDC/EDC could be what is causing the instability. And it may be worthwhile to slowly raise those, as I did, until the stability issues reappear.
can you put your bios parameters? i have a MSI mobo with 5900X.
I was having WHEA 18 crashes almost everyday, but one day I was searching for possible solutions and in a HWinfo forum someone discovered the issue to be caused by the AMD RX 6000 series GPU paired with a Ryzen 3000 and 5000 Series as both use the same IO die, this was not a hardware problem but a software problem related with the sleep state of the GPU. There are beta versions of HWinfo that are supposed to fix the problem and the same applies for AIDA64, there is no info on AIDA64 forums like there was for HWinfo but since they share very similar code, I tested the latest Beta version of AIDA64 Extreme and no more WHEA crashes, I haven't had a crash in a little more than 2 weeks, before updating to beta, I had crashes almost everyday.
My BIOS settings: ALL default except for D.O.C.P. 3600MHz, CPU multiplier - 40x and core voltage at 1v, this is mainly to keep power consumption and temperatures down, while still having more than enough power for gaming and other tasks.
Edit: I should have said that my solution might not fix the crashing for everyone, it's just a common issue with people using monitoring software with latest AMD GPUs and CPUs, and forgot to mention I'm in AGESA 184.108.40.206 on an ASUS ROG STRIX X570-E GAMING, Ryzen 9 5900X and RX 6900XT.
Yes HWinfo and AIDA64 and both installed and portable as files are the same. If you want to test just disable all GPU related sensors or update to the latest beta. For me AIDA64 was causing crashes with the latest stable, but with latest beta crashes are completely gone for more than 2 weeks.
Just an update on this. So far, beta BIOS 3603 with AGESA 220.127.116.11 Patch A is proving stable. Not had any more WHEA/BSODs during normal use (work and gaming).
I have left DOCP off with the timings/speed/voltage manually configured as stated in my previous message. I have disabled the "DRAM Power Down" option as was suggested as well but left everything else on automatic.
Also quickly dabbled with PBO by enabling it and stress-testing proved stable as well with quite a significant uplift in CB23 scores but have decided to leave it off for now until any remaining bugs with BIOS/AGESA are ironed out in the coming weeks/months.
Just also remembered that prior to upgrading to 2 x 16 GB sticks of RAM, I used to run 4 x 8 GB sticks (similar speed but looser timings) that when switching DOCP on it led to failure to boot until I removed 2 sticks whereas manually configurating the speed/timings/voltage (with all 4 sticks) was fine.
I can confirmed that, my system with 5900x with PBO disabled randomly restarts the system. No error page, i immedeatly found by self on bios bootup sequence.
My system is:
32x2 Spectrix d50 3200mhz ram
Asrock b550m steel legend
gtx 1070 zotac
latest windows (tried re-intstall)
I can provide any additional information yet this is so random that i can not reproduce the problem easily
Is there any valid solution to this yet?