Something worth trying is to enable PBO but set the limits to the defaults for a 105W TDP Processor. (PPT=142W, TDC=95A, and EDC = 140A). Test the system and see if you have any errors.
If you do not, slowly raise PPT, TDC and EDC until the errors return. You can turn on PBO and set those limits to auto and look in Ryzen Master. The levels set in there will be the absolute limits for your motherboard. Take note of those, as those area the limits you won't want to exceed. Those levels are usually something ridiculous and it seems that by just blanket turning on PBO, the system is actually exposing a weak link somewhere.
You can just turn PBO off, but it is possible you can squeeze out a little bit of extra performance without crashing the system.
For some background, I have a Ryzen 9 5950X on an ASUS Crosshair VII X470 motherboard with an EKWB custom loop. I have four single rank DDR4 DIMMs (8 GB each) at 3600 MHz, matched 1:1 with Infinity fabric at 1.35V.
My system was stable with simply turning on PBO, but would crash immediately when running OCCT small data set, extreme, with constant load.
Eventually I settled on the approach above, and ended up with (PPT = 215W, TDC = 140A and EDC =160A) I stop at these settings as it was where my CPU hit around 70C under load and the voltage on a all core load was at 1.3V. Anything higher and those voltages and temps just got higher for minimal performance gains. Cinebench R23 scored just over 29500 in multicore and 1659 in single core.
Are the errors you are all hitting from overclocking only? I was trying to get a 5950x but had read some stability concerns and wondered if it would be a headache. I have no plans to overclock but really would like to avoid issues. I was going to go with an Asus board and if I don't got this route it was going to be the 11900x but based on the early signs there I know 5950x or 5900x is the better performing option.
Don't know if this will help but I've found that the newer AGESA releases 1.2.0.x seem to be problematic.
ASUS ROG Strix X570-F Gaming
32 GB TeamGroup DDR-3600 (2 x 16GB) with 16-16-16-38-55 timings (dual-rank DIMMs)
2 TB Sabrent Rocket nVME
Palit Gamerock Premium GTX 1080
Windows 10 (21H1 release)
EVGA 850w 80-plus Gold PSU
I recently had BIOS 3602 installed with AGESA 18.104.22.168 and found I was getting WHEA BSODs when gaming - usually between 20-90 minutes. The error code was along the lines of:
Processor APIC ID: 14 - Bus/Interconnect Error
With an earlier BIOS (3405) with AGESA 22.214.171.124, the WHEA error was similar but the description was "Cache Hierarchy" error.
I have since downgraded back to BIOS 3001 with AGESA 126.96.36.199 - so far no more WHEA errors or BSODs when gaming.
Interestingly, the CPU-Z benchmark score is also higher with AGESA 188.8.131.52 than the newer versions (around 2-3%)
It is frustrating as my last two builds were with a Ryzen 2700x and a 3900x, which had no issues. Before that I ran Intel (i7-4790k), again without these sorts of issues.
MSI dont have a bios with AGES 184.108.40.206, from 220.127.116.11 jump to 18.104.22.168 or 22.214.171.124 or 126.96.36.199 (beta)
In this momento i have PBO advanced only with curve +5, is a little more stable without games, today one crash meanwhile watch a movie.
X570 chipset have a lot of issue with 5000x cpu series, but B550 is a lot more stable chipset
adding my self to the list;
Same crashes with 5600x latest bios did not fix.
Just tried the latest beta (3603) for my board (AGESA 188.8.131.52 Patch A) but this time, I've not enabled DOCP - I've just left things to automatic but manually set the RAM speed (3,600), IF speed (1,800), and timings manually (16-16-16-38-55). When I used 4 x 8 GB DIMMs, setting DOCP would fail but moving to 2 x 16 GB DIMMs, DOCP seemed to work but started getting WHEA errors.
Played around 1.5 hours of AC Valhalla with no WHEA error although I got a blank screen after a loading transition but I think this is just likely a bug in the game and I was able to kill the process and continue.
I've also tried OCCT and Prime95, using CPU Affinity to cycle through the cores without any error. Will monitor over the next couple of days/weeks and update if anything else happens.
Is the WHEA error after KERNEL-Power error? 10 to 20s after?
If you "Disable" DRAM Power Options under AiTweaker>DRAM Timing Control>Power Down Enable
Also located in Advance>AMD OC'ing>AMD OC'ing>DDR & Infinity Fabric Frequency/Timings>DDR Frequency & Timings>DFRAM Controller Configuration>DRAM Power Options: "Disable"
Have you set the DRAM voltage to 1.35? Higher for B-die (Samsung Single Channel). I have 64GB of Corsair RGB running at 1.45 @ 1800/3600, they are CL18/3600MHz, no timing adjustments.
I still think this is a power detection issue from 10+ years ago buried in the AmeriTrend Bios base. These new processors will pull over their 105W TDP & 95A TDC base without hesitation due to water cooling. Its a server processor on a PC platform, made affordable for multi-tasking & gaming thanks to ASUS & others. When they quick boost vs. long boost the transient voltages & current draw erratic. Logging shows .1ms spikes.
Then you can look and research all the issues with AMD chipsets, Microsoft Windows, WHEA & KERNAL Errors going back 10+ years. Issues with WHEA are predominant in Laptops with high-end processors
I am running the new ASUS Beta Bios as well. Same KERNAL-Power Critical Errors every 12 hours or so. My PC is on 18-20 hours a day. However, I am still running an aggressive PBO 2 Curve with 125MHz. So, hopefully my latest curve adjustments will rectify that or I will turn the PBO2 down. I have gone to a EVGA G3 1000W 80+ Gold PSU. I will end up trying a 1200+ Watt Platinum or Titanium next. My new Red Devil RX 6900 XT has 3 power plugs...
"These new processors will pull over their 105W TDP & 95A TDC base without hesitation due to water cooling."
They will not, unless you turn on "Precision Boost Overdrive", with that on, your PPT, TDC and EDC will be set at motherboard limits and not the package limits of 142W, 95A and 140A. The motherboard limits are usually something ridiculous that a CPU will never be able to hit regardless of cooling. It is best to turn on PBO, and manually set the PPT/TDC/EDC limits to the stock limits. This should mirror stock settings. You can then slowly raise those limits until you hit the temp/voltage you are comfortable with or you start to see errors. Once that is done, you can play around with curve optimizer and the clock speed limits to get some additional performance.
On my 5950X I got to PPT 215W, TDC 140A, and EDC 160A. At those settings I am TDC bound on an all core load. TDC 100%, PPT 94%. My voltage on all core is right around 1.3V and temps are right at 70C on an EKWB custom loop. Going any higher just raises the voltage and temps with little performance gain. I have a +100 MHz to clock speed and also played with the curve optimizer a bit after the fact.
My DRAM is 4 single rank DIMMS (32 GB total) at 1.35V. Set at 3600 MHz (CL14) and IF at 1800MHz. Timings are custom, I did not use the DOCP profile.
No errors or crashes with those settings. I can pass OCCT large data set, extreme, variable without error as well as the small data set, extreme, continuous.
["These new processors will pull over their 105W TDP & 95A TDC base without hesitation due to water cooling."
"They will not, unless you turn on "Precision Boost Overdrive", with that on, your PPT, TDC and EDC will be set at motherboard limits and not the package limits of 142W, 95A and 140A. The motherboard limits are usually something ridiculous that a CPU will never be able to hit regardless of cooling. It is best to turn on PBO, and manually set the PPT/TDC/EDC limits to the stock limits. This should mirror stock settings. You can then slowly raise those limits until you hit the temp/voltage you are comfortable with or you start to see errors. Once that is done, you can play around with curve optimizer and the clock speed limits to get some additional performance."]
Excuse me? In context to the messages (replies) with that individual, that statement does not remove the fact of how many times I have mentioned PBO2 and my Boost Curve.
I did not use the words "stock BIOS settings" or "out of the box settings", therefore I never insinuated what you are claiming I did. Why would I mention water cooling...obviously not in an OEM scenario.