cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

CrispyCrunch
Adept II

Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-Power

Mainboard: MSI x570 Unify
Mainboard-BIOS: 7C35vA82 (Beta version)
CPU: Ryzen 5900x
RAM: Crucial Ballistix BL2K32G36C16U4B 3600 MHz, 64GB (32GB x2)
Drive: M.2 Samsung 970 Evo+ 1TB SSD
Graphics: SAPPHIRE Nitro+ Radeon RX 5700 XT
PSU: be quiet straight power 11 750w Platinum
OS: Win 10 Pro (64bit) - all updates installed
Chipset driver: 2.9.28.509 (released 2020-11-09)

I first assembled the PC with a Ryzen 3800x a week ago because it was unclear if and when I would get the Ryzen 5900x I ordered. Worked with the included AMD Prism Wrath CPU cooler for one week without any problems.

- Today I installed a Ryzen 5900x and a Scythe Fuma 2 CPU cooler.
- After 20 min the first crash/restart with the following entries in the Event Viewer: WHEA-Logger ID 18 and critical error Kernel-Power ID 41.
- Happens irregularly again and again, sometimes after minutes, sometimes longer: Windows freezes for a few seconds and then the PC reboots. Doesn't matter if load or not.
- CPU temperature between 30 and 40 °C
- Updated to BIOS and chipset driver mentioned above: Problem still exists
- XMP Profile disabled (RAM on 2600 MHz): problem still exists
- CMOS Reset: Problem still exists

Either there is a compatibility problem of something with the CPU, or the CPU is defective?
What to do? Really frustrating.

2 Solutions

Im having a similar issue, x570 aorus and 5600x. Have same errors on windows. 

Disable CBP and PBO and run it at default settings (3.7 ghz and xmp on). That works for me. 

View solution in original post

I got a new angle on this. So deactivating PBO and CBS definetely works, PC was running stable for a week now. But you'll loose performance.

So I wrote to the MSI support and the AMD support.

MSI suggested to try increasing the DRAM Voltage by 0.05 V, which I did. System seems to be stable, no crashes so far - neither in idle or while gaming.

View solution in original post

947 Replies

@koguma After testing and troubleshooting for 6 months and still getting the issue at least once a week. I replaced my CPU, after I ran my PC solid for a month, no issues. I've not had a single issue since I RMA'd my CPU.

I'm pretty sure it's fixed, it's been months now.

It could be the GFX card for some, I'd say it's mostly the CPU, but people will need to narrow it down for themselves.

I managed to resolve the issue temporarily by adjusting all cores by +1 in the Curve Optimizer settings but shortly after upgrading to an RTX 3090 I was getting a different issue in certain games (mostly Control) where the screen went blank and the PC would not reboot (there was a constant white LED on the motherboard), requiring a full power-off.

It turns out the PSU was not able to handle the system properly as I swapped it out for an old 2008 (!) Corsair HX1000 I had in a spare PC and the system has been rock-solid since.  I've disabled PBO/Curve Optimizer for now and am running Cinebench's stability test as that used to cause WHEA errors/BSODs before.

I've had that black screen before with the Vega 56.  I thought it might be the PSU, but my PSU is 1kw.  It can handle pretty much anything AMD can throw at it.

Agreed a 1kw supply should deal with pretty much anything.  Researching a few articles suggests that the previous PSU might have developed a fault or some of the protection measures might have become over-sensitive to sudden fluctations in power draw.

I've run the Cinebench stability test several times, and have been gaming for several hours and no signs of any instability nor any WHEA BSODs since changing the PSU.  I'll post again if this changes.

I'd be surprised if it was anyones GPU as people with both AMD and Nvidia GPUs had this issue.  Also someone fixed it with a new PSU?   I am at a loss with these reboots and its pretty frustrating.  Going to try the curve thing, then will try to replace the 3080 TI (good luck, they prob wont find one) then will try to replace the CPU.  I honestly thought it was the gigabyute X570 Aorus Master motherboard but peoiple in threads saying they had it on multiple boards including MSI and Asus.  Others claimed they RMA'd their 5950X's with AMD 3x and all of them still had the issue.   Not sure which to believe anymore!  Meanwhiile my PC is liable to reboot at anytime and these sys crashes are NOT graceful.

My issue has returned.

No issues for a couple of months after replacing my 5800X with a new one.

However out of the blue it crashed again on the 8th and again on the 10th.

I'm a bit tired of this now, I've never had this much issue with a platform ever before.

PSU was fine in my old system that drew more power, however maybe the new system needs something more sensitive. However there's no recommendations or specifications saying so.

The issues is most likely something to do with the MOBO, CPU, Gen4 NVME or 3600 MHz ram which I got at the same time.

The ram is no longer on the QVL list so maybe that's it but I would not of thought that would cause power cuts. The PC power light and RGB remain on but everything else power cuts which is strange.

Strange how this issues seems to follow infinity fabric. It's just so hard to narrow down.

Maybe the issue is just super random and rare but affecting all infinity fabric systems to some degree.


@kneel420 wrote:

I'd be surprised if it was anyones GPU as people with both AMD and Nvidia GPUs had this issue.  Also someone fixed it with a new PSU?  


It's any number of hardware faults that can cause this issue.  There's not a magic fix-all.  In my case, even though all of my windows dump files indicated an AMD issue, it was 100% the video card.  Even a bad PSU makes more sense to me than the bad GPU, but here we are.  I confirmed my card was bad in another rig after the crashing escalated, got the replacement, and I've been good ever since (knock on wood).

PS - And in my case, that was after getting baited into a 5900x warranty replacement by the event log / memory dump references to AMD.

My issue started when I still had a 780 GTX installed, I only started noticing it more after I installed the 6900XT. The first couple of times I noticed it I just passed it off as random when I had a 780 in the system. Back then when it happened the system would reboot and when I came back I didn't notice the issue as Windows was back one desktop. I've turned off auto rebooting.

Once I checked the event logs when I had the 6900XT I saw multiple events back to the day I installed the CPU, MOBO etc. There was nothing before then.

So in my case I don't think it's the GPU. I also don't get any dump logs, it's a hardware failure and it's unable to create a dump. At least none that I'm aware off.

I've been ok for nearing the past two weeks by setting my Ram speed down to 3333 via the "Try It" MSI bios option.   Most likely it's an issue of really tight timings and/or specific timings.   My ram's XMP was specifically for Intel (I bought Intel certified ram, not AMD) so XMP always fails.  That means I can't run my ram at the specified 3600 speed.  I should probably reach out to G.Skill and ask them about this...

I'd love to hear what they say. I checked the QVL before I bought but maybe I messed up and bought the wrong stick or maybe they removed them from the list.

Still you'd think it wouldn't be that difficult, a quality stick of ram should be on the QVL list.

Every timing has to be right in RAM for the system to work, and you can get WHEA errors from incorrect RAM parameters.  A lot of people probably haven't tried to set the timings to go along with their FCLK - specifically TwtrL might go from 8 to 9, and TWrwrScL & TrdrdScL might need to go from 3 to 4 or 5.  Might also increase Twr and TCWL by one. 

 

On top of this, if a a core's neighbor is boosting too much while a primary core is handling certain tasks, you can get a reboot during power state transitions to idle. Make sure in PBO2 you didn't put in a scalar over 2 or an undervolt too high on ANY core. Particularly a neighbor of a primary core.   Also check your VCCD and VIOD to make sure they are high enough to support the speed you run. But don't just increase voltages for VSOC, VIOD, VCCD VDDP without consideration for your system, and the voltages and temps on the CPU.  I think it's safe to run soc 1.19, vddp 1.05, vccd 1.14, viod 1.14,  

  You might try to increase your cadbus ClkDrvStr from default (20 ohm) to about 30 ohms. 30 ohm seems to work better for voltages between 1.4 and 1.45, in my experience. Even 40 is fine if you run higher voltages on ram, like, over 1.45.  You'll get better timings and performance with a coorectly set ClkDrvStr and such. This and dual rank ram or more modules means you might also need higher ProcODT, slightly.  I can't say the dram Ryzen calc is perfect, I wish AMD would provide their own DRAM calc and core cycler benchmark/test suite... sigh

You want your cores running from 1.25v under thermal scenarios up to 1.425v under pbo max boost condition.  Generally anything that's a heavy load shouldn't push more than 1.375v sustained.  This info will help you use like, hwinfo or whatever to double check your procODT and stuff.

 

 

@rumple I'm used to things running fine at stock settings and not having to tweak things unless I want to. Things should always run stock.

 

All the ram timings are stock, same as on the packet with DOCP on.

PB is on but PBO is off.

It also crashed a 2nd time yesterday that's 3 in total, I have no idea why it went from stable to unstable. The only thing that would of changed is automatic update things and I doubt that's it.

I'm tempted to buy another stick of ram or a PSU next however maybe it's the MOBO or Gen 4 NVME that's the problem. At first the new CPU was far more stable, It's weird. Maybe nothing can fix it.

I was getting the same errors and had always thought it was GPU problems until I came across a similar story to mine from black screen restart while watching Youtube....

 

I don't know if you have tried these settings as this is a now 85 page forum post but could be low load restart issue.

As quoted.....
1. Disable C-stage power
2. Set PCIE to 3.0
3. Power supply set to Typical Idle.
SOC and Vcore all auto, XMP enable at default for you ram, CPB and PBO enable.

I don't have PBO on though myself, the power ramp for the minor speed rewarded isn't worth the hassle imo

0 Likes

My solution was replacing the cpu, after that, no problems whatsoever...

0 Likes

@Soulsa It's all been covered many times. It works for some, not everyone. However it's a work around not a fix.

Ryzen with infinity cache just can be unstable with various combination of components and narrowing them down is difficult because for most of the time it's stable.

I heard a story of 2 friends they built identical systems, one had the issue the other didn't. They swap CPU's with each other to see if the issue followed the CPU. However after they swapped CPU's, neither of them had the issue again.

I also heard Ryzen 3800X's had the issue and it was eventually fixed for most by a BIOS update.

For others it could be fixed by a CPU, GPU, PSU, MOBO or RAM swap, I've even heard of a sound card causing it.

I've thought I've had it fixed multiple times only to have it reappear months later. I still want to get to the bottom of it, however in all the systems I've ever owned, both AMD and Intel, I've never had an issue like this.

It sucks, I'm over the moon AMD are doing fairly well BUT whenever I have had Intel they always just worked....NVIDIA also.... My 5700xt has had DRIVER problems for over 6 months, AMD KNOW also and we bombard them about it all.

 

I can't help but think it's always the driver team letting down the products...

0 Likes

@Soulsa they have a vastly smaller team so it's hard to compete but their software is maturing.

Infinity fabric is new tech, never really been done before on a home PC. Intel is getting into similar tech maybe things will be less stable these days.

Maybe they're just pushing it too far. Maybe they made an assumption about the combos of rigs people would make.

Without knowing the cause it would be hard to say. I do suspect it's a combination of issues though.

So....

After all the hoop jumping.....  It looks like once I upgraded my BIOS for my Gigabyte X570 Aorus motherboard from F30 to F34, no more reboots and WHEA-Logger 18 errors.   Mostly this issue I think is a combination of hardware and firmware version incompatibility.  

@kneel420 After replacing my CPU I couldn't reproduce the issue as much as tried. A couple of months later it returned. Everything was fully updated when I replaced the CPU a couple of months ago and because it was working I didn't change anything except auto updates in Windows.

My current BIOS is 2401 the lastest is 2420 - AGESA V2 PI 1.2.0.3 Patch C this is a few revisions later, I'm gonna try it next and see if it helps too. It probably is a combination of few things for most people. Maybe the chipset driver changed and needs a BIOS update or something.

Maybe it's my RAM G.Skill F4-3600C16D-32GTZNC, I could of sworn it was on the ASUS Mobo QVL before I bought it, but after noticing the MOBO wasn't on G.Skills QVL page, I  noticed on the ASUS MOBO page it was now only showing for the 3000 series. The MOBO is now back on the G.Skill QVL page, I was hoping this is because the latest BIOS. Still not on ASUS' page yet.

Anyway it's weird that RAM is going off and on the QVL lists.

 

Yeah its definitely combinations of hardware and firmware version incompatibilities.   Really comes down to ensuring all your drivers are up to to date.  

My system is rock solid now and was about to replace the processor or worse. 

Would have found out sooner but Gigabyte's site was broken due to alleged ransomware attack so many of their files (bios, drivers) are 404ing still.  

Be careful with the advice online, just go through ALL your hardware make sure drivers updated it should solve any issues.  I thought I had done that but Gigabytes BIOS tool kept telling me it was updating the bios firmware but it wasn't, probably due to that issue.  I had to do it manually

0 Likes

@kneel420 I've always kept my system fully upnto date, except for the time after I swapped my CPU. While testing my new CPU that was the only change I made, I ran the PC constantly for 17 days straight trying to make it fail and couldn't.

The issue returned a couple of months later however.

If this was just software related It would of been fixed the whole time during the 6 months I was trying to fix it.

The instabilities in Ryzen are just not acceptable.

0 Likes

I'm going to make two post, apologies in advance. This post was actually from the 10th, the next will be me catching up with y'all.

----------

This was a long post but it's not letting me post, so if AMD ever do look at these forums, that's the reason for the multiple posts.

Anyone reading ... you can skip to your reply if you don't really care that much (I reply email style, always have, always will) ... but mostly everything I'm writing is general musings.

Most importantly, though.

To Everyone In The Thread - CAN YOU REPRODUCE THE FAILURE?

It's going to be really helpful if y'all can isolate a specific thing (or things) that causes the crashes here. Because if these start to overlap, then anyone that does eventually try to do something *real* about this will be able to take that and run with it, diving into all the code and components that are involved.

For me, this is quite reproducable:

  • Time-Spy - Resolution 1440p, Rendering at 4K, 60FPS, DX12 (and 11) - It happens as the avatar/thief turns away from the giant back into the museum. The only time anything reaches high utilisation/temps during this is at the very start, then it's buttery, otherwise. No spikes on anything at time of crash.
  • Total War: Warhammer II - Highest everything, GPU memory unlimited/not-unlimited, various other settings - It happens directly after a successful siege battle with Grom in one of the immediate territories. When the post battle screen is on, and you click to continue after selecting your bounties, boom, power down. Yes, I'm aware this game engine is one of the worst optimised pieces of crap out there. I'm also aware that it's one of the best RTS ever made (I've even said so!), plus the fact that it's badly optimised makes it a good test.

I'll happily try any other tests that others have confirmed cause the issue, although ZiN's recent relapse has be doing permanent wearyface.:-(




ZiN (and everyone, tbf), I'm aware that your issue has come back, and I'll be coming back to that eventually, but thanks for the reply, none-the-less.

My PC is back now from SCAN (they've been great, if a bit uncommunicative) ... and what I'll be doing is:

  1. Putting all the drives except the spinner in the PC
  2. Not connecting any extraneous devices apart from the monitor. What I can remember that's:
    • The HDMI to AVR (4K res) that I mostly use for the surround outs to the Razer headset.
    • The USB 3 switch (unpowered).
    • External drives.
    • Second USB 3 switch (powered).
  3. Booting up with the new BIOS that (yes, I could've done it) SCAN put on and reported zero issues apparently testing at the specs I run on.
  4. Sighing that something.
  5. Testing immediately.
  6. I expect it to fail, which would indicate (for me) driver/software conflict. If it doesn't, then great ... if it does ...
  7. I'll have to reluctantly upgrade this (my main) install of Windows and try again.

I'll then install linux, and test there.


@Cmdr-ZiNwrote:

 for some reason I don't get notified of thread updates anymore, but I did see the mention thanks.

... Good luck I hope you solve it.




 

Cheers for this, Anzu.


@Anzu34wrote:

 
- CPU : AMD Ryzen 5900x 
- MB : MSI B550 Tomahawk
- GPU : MSI Geforce RTX 3080 Ti Trio Gaming
- AIO : NZXT Kaken x73 360mm RGB
- RAM : Crucial Ballistix 2x16 Go 3600Mhz Cas16
- PSU : Seasonic Prime PX-850, 850W Plus Platinum 




 

ace50k, this is, of course, the exact same PSU that I'm using, and the one that SCAN profess to have worked absolutely fine. I just don't understand why you should need to disable anything if you're running at stock.


@ace50kwrote:

It turns out the PSU was not able to handle the system properly as I swapped it out for an old 2008 (!) Corsair HX1000 I had in a spare PC and the system has been rock-solid since.  I've disabled PBO/Curve Optimizer for now and am running Cinebench's stability test as that used to cause WHEA errors/BSODs before.




 

kneel420's comment had me wondering if house electricals has anything to do with this?

Becuase one thing that I know is that (I'm aware how bad this is) the grounding in the office I'm plugged into isn't amazing. I do feel for you, though. I'm X570, too, a lot of responses in thread have listed full (as poss) specs, in the long run it could really help, but it won't mean much to most of us.


@kneel420wrote:

I'd be surprised if it was anyones GPU as people with both AMD and Nvidia GPUs had this issue.  Also someone fixed it with a new PSU?   I am at a loss with these reboots and its pretty frustrating.  Going to try the curve thing, then will try to replace the 3080 TI (good luck, they prob wont find one) then will try to replace the CPU.  I honestly thought it was the gigabyute X570 Aorus Master motherboard but peoiple in threads saying they had it on multiple boards including MSI and Asus.  Others claimed they RMA'd their 5950X's with AMD 3x and all of them still had the issue.   Not sure which to believe anymore!  Meanwhiile my PC is liable to reboot at anytime and these sys crashes are NOT graceful.




 

ZiN, I would pontificate around whether it's to do with how the motherboards handle the power, here. All the BIOS updates in the world won't mean a thing if the actual problem is specific to the manufacturer's processes (and that is happening cross company). However it is unlikely, because we have seen this reported across various AMD platforms, on multiple manufacturers, and the only thing that has been gradually improving things are BIOS updates.

Might be time for us to start a couple of threads at ASUS/MSI/etc (and maybe memory), just to ensure that there's parity there, to ensure that they even *know* this is a thing.

I don't believe for a second that we're just a few isolated cases on the internet, I believe most people will just either ignore it or never trigger whatever is doing it.


@Cmdr-ZiNwrote:

My issue has returned.

No issues for a couple of months after replacing my 5800X with a new one.

However out of the blue it crashed again on the 8th and again on the 10th.

I'm a bit tired of this now, I've never had this much issue with a platform ever before.

PSU was fine in my old system that drew more power, however maybe the new system needs something more sensitive. However there's no recommendations or specifications saying so.

The issues is most likely something to do with the MOBO, CPU, Gen4 NVME or 3600 MHz ram which I got at the same time.

The ram is no longer on the QVL list so maybe that's it but I would not of thought that would cause power cuts. The PC power light and RGB remain on but everything else power cuts which is strange.

Strange how this issues seems to follow infinity fabric. It's just so hard to narrow down.

Maybe the issue is just super random and rare but affecting all infinity fabric systems to some degree.



koguma/ZiN: I think raising it with the mem folks is a good (as above) idea, however, ensure that in your core, first, post, you reference this (long as heck) thread, so the context is clear. Also, as previously mentioned a lot ... including specs, triggers, and references to others with the same incident seen can increase the likelihood of it being taken more seriously.

Also, I spoke to SCAN about QVL specifically, and they stated that the RAM chosen for my setup (in a previous post) was QVL, however I wasn't totally sure.

One thing that I'm definitely thinking could be a wild shot in the dark is alerting one of the tech players on YouTube about this. They likely won't respond, but in the rare chance that they do, perhaps LTT, HWU, Gamers Nexus, and the grey haired bloke, might be able to 'raise awareness' ... or at least get us a specially coloured ribbon to wear.


@kogumawrote:

I've been ok for nearing the past two weeks by setting my Ram speed down to 3333 via the "Try It" MSI bios option.   Most likely it's an issue of really tight timings and/or specific timings.   My ram's XMP was specifically for Intel (I bought Intel certified ram, not AMD) so XMP always fails.  That means I can't run my ram at the specified 3600 speed.  I should probably reach out to G.Skill and ask them about this...



@Cmdr-ZiNwrote:

I'd love to hear what they say. I checked the QVL before I bought but maybe I messed up and bought the wrong stick or maybe they removed them from the list.

Still you'd think it wouldn't be that difficult, a quality stick of ram should be on the QVL list.


 

I would say, though, that the NVidia user experience is many multiples worse than this Radeon software that I'm now seeing on this card. Added to which there's quality of life stuff that I've always had issues with Nvidia (especially on sound) that I just really needed to switch. I'm happier now (not compensating ), but I don't want this to be a system breaker, y'know?


@Soulsa wrote:

It sucks, I'm over the moon AMD are doing fairly well BUT whenever I have had Intel they always just worked....NVIDIA also.... My 5700xt has had DRIVER problems for over 6 months, AMD KNOW also and we bombard them about it all.

 

I can't help but think it's always the driver team letting down the products...




I wonder if it's then down to differences in the CPU manufacturing. There could be very subtle differences, and these BIOS updates are tuning the system for the ones that they know are getting a pernickety. They're just addressing each manufacturing line one by one, hoping to eventually cover everone?


@Cmdr-ZiN wrote:

@Soulsa It's all been covered many times. It works for some, not everyone. However it's a work around not a fix.

Ryzen with infinity cache just can be unstable with various combination of components and narrowing them down is difficult because for most of the time it's stable.

I heard a story of 2 friends they built identical systems, one had the issue the other didn't. They swap CPU's with each other to see if the issue followed the CPU. However after they swapped CPU's, neither of them had the issue again.

I also heard Ryzen 3800X's had the issue and it was eventually fixed for most by a BIOS update.

For others it could be fixed by a CPU, GPU, PSU, MOBO or RAM swap, I've even heard of a sound card causing it.

I've thought I've had it fixed multiple times only to have it reappear months later. I still want to get to the bottom of it, however in all the systems I've ever owned, both AMD and Intel, I've never had an issue like this.




Mate, I would recommend highly, if you can, listing your PC specs, and if possible, whether you have found a way to capably replicate the incident. You don't have to, of course, but if this ever *does* become a thing, your input could really help.


@Soulsa wrote:

I was getting the same errors and had always thought it was GPU problems until I came across a similar story to mine from black screen restart while watching Youtube....

 

I don't know if you have tried these settings as this is a now 85 page forum post but could be low load restart issue.

As quoted.....
1. Disable C-stage power
2. Set PCIE to 3.0
3. Power supply set to Typical Idle.
SOC and Vcore all auto, XMP enable at default for you ram, CPB and PBO enable.

I don't have PBO on though myself, the power ramp for the minor speed rewarded isn't worth the hassle imo


 

I should have clarified that apart from enabling DOCP to set the memory to 3600, all other settings have been left to their defaults.  Before swapping the PSU the system could occasionally BSOD when at the Desktop though most commonly it occurred during gaming with the "Cache Hierarchy" error in the event log.  This was when I was using the GTX 1080 card and I had to manually set +1 in the Curve Optimizer to stabilise the system.

After swapping to an RTX 3090, I started getting a blank screen when gaming and the system rebooted but would not proceed with booting Windows (there was a white LED constantly lit).  The Curve Optimizer was still set at +1 for all cores.  After swapping the PSU out, I no longer got the black screen/reboot and I decided to set all CPU options back to default so that single-core boost would peak properly.

Since the last time I've posted here, I've played a good 10+ hours of games including AC Valhalla and Control with all settings maxed out and the system has been rock-solid.  I can only think that my EVGA PSU had developed a stability issue in circumstances with rapidly changing loads.  It seems to work fine with my spare system that now has the old GTX 1080 and a Ryzen 3900X so perhaps the 5xxx series of processors are more sensitive to the PSU, at least in my case?

 

5600x, Have been battling with whea incorrectable error for 3months, after i RMA I will sell the 5600x an the b550 mb an go back to Intel.

Not a word from amd, quick to take our money 

I'm not entirely certain my issue is the same as the problems found here, but I am getting WHEA 18 critical Kernel Power errors, so I'm going to share my experience in case it helps find a solution.

My specs:

Ryzen 7 5800x
EK AIO 240
Asus Crosshair VIII Dark Hero
G. Skill Trident Z Neo 16GBx2 3600 CL16-16-16-36
Asus ROG Strix 3070
Samsung 970 Evo Plus 1TB
Seasonic Focus GX-850

This system was put together a couple of weeks ago and everything seemed solid. Cinebench, Superposition, Heaven and games like Doom Eternal and Gears 5 worked great. Skip to a few days ago where I tried to download another game from the Microsoft Store. In the early stages of the download the system just restarts. Event viewer shows the following:

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Bus/Interconnect Error
Processor APIC ID: 0

I had played with some PBO and CO settings in attempt to lower temps. So I set those back to auto and tried to download again. Download lasted longer but the system ultimately restarted with the same WHEA error. Only other thing left to change in bios was the DOCP. I lowered it to 3200mhz and FCLK to 1600 (down from 3600 and 1800). I am now able to complete downloads for a couple of games and have not seen the error with these settings... yet.

The download thing is odd. However, when I attempt to download through the MS store the WHEA errors consistently appear and force a restart when DOCP is set to 3800 and FCLK to 1800. Everything else is set to their default/auto settings.

Is this an issue with my specific RAM, or does it relate to the CPU? Isn't FLCK tied to CPU voltage in some way?

Almost forgot to mention, my Asus bios version is 3401. I believe it has AGESA 1.2.0.1.

EDIT: Scratch that. I am no longer stable at 3200 and 1600. Just experienced another WHEA 18 while attempting another download.

@authorized_to_ill 

Something I also found on my side is that when windows throw WHEA errors due PBO/CO even after changing values back to AUTO, the system would keeps crashing with WHEA errors which was not before.

The only solution to avoid it to crash after, setting it back it to Auto, is to clear CMOS or reinstall the bios firmware to be sure. Not sure why this but, it is like if the PBO/CO setting are getting stuck on older values after crash.  (Simple test is, once the settings are set back to Auto and if you go to change it again to Advanced, all the older values that were set and which raised the WHEA errors will appear back on the advance settings. After doing a full reset of bios, this doesn't happen and shows the all fields like if it was the first time activation and you need to populate them.) 

I've tested this several times on my gigabyte X570 xtreme and all time, I got WHEA errors and set it back to AUTO and reinstall bios firmware it fixed it. You may try it on your side. 

Meantime for you WHEA error, you are maybe to high with the CO value on Core 0. 

@kodo28 

I appreciate the suggestion, but my bios has been properly resetting the values. I have set to auto, manually set them to 0, hit F5, and I even just flashed the latest Asus bios. The issue persists. In fact, the latest bios update seems to have made it worse. 

0 Likes

Ok, I guess I've got to get my two cents in:

We can get the high temps under control. That is the easy part.

This is what I tend to advise most people,   The parameters pertain to the 105W TDP processors 5900x and 5950x

I would set VSoc from Auto to Normal

Then add a .006v positive differential to VSoc

Secondarily, I would set VCore from Auto to Normal

Then add a .006v positive differential to VCore

The above boosts Voltage by the smallest amounts(which your processor needs)

The amount is less than what the CPU would boost when it changes frequencies.

A very slight boost of voltage for VCore and VSoc  I reccommend for most crashes.   (A little goes a long way)

The above should keep you stable at the lower frequencies but also boosted voltage at the higher frequencies

Try this, this should help, and if it doesn't totally cure crashes at idle then, use .012 for VSoc and Vcore differential.    (still a VERY minor increment)

 

To get those higher frequencies under control I suggest:

Turn Core Performance boost Enabled

Turn PBO from Auto to Advanced

Set PBO Limits to Manual

Turn PPT from 142w down to something like 120w

When the processor sees that it is approaching the new lower power limit, it would be discouraged from boosting to 

an even higher frequency.    Keep TDC at 95A,  and Keep EDC at 140A.

Set "Platform Thermal Throttle Limit to Manual.    Then, the next field that pops up set "Platform Thermal Throttle limit

down from 90 to 75  or whatever temperature in Celsius that you feel comfortable with

*****

Yes Core performance boost can make the system run very hot.  Not so much PBO.   If you turn them off you lose lots of performance.    But Enabling them and  letting them run, with a more limited Power (you don't really want to hammer that CPU do you?) and with a tighter limit on temperature.    You get much of the same boost as before, even sometimes greater, but without the negative stuff.    Try it. I think you will like it.

 

0 Likes

Let me know if anyone wants to try all that and see if it works. I've never in my life spent as much time in a bios screen as I have in the past week. I expect this cpu and it's components to run stable at stock settings. If that's too much to ask then I am going to take advantage of my return window and switch back to Intel. I'll take the hit to performance and efficiency if it means I don't have to deal with this error anymore.

0 Likes

I just found this discussion: https://community.amd.com/t5/processors/ryzen-5000-crashes-whea-errors-will-get-a-quot-silent-fix-qu... 

If that's the case then I am definitely just going to return and get a refund. This is ridiculous.

0 Likes

@authorized_to_ill that information is nothing new, it's speculation at best. The only statement I've heard is B2 is to improve manufacturing effeciency.

If this was a widespread issue it would be all over the news.

I have to imagine that while many are affected it's probably not enough to land on anyones radar.

@eliotcole If you want my PC specs you can look a all my posts and scroll to the beginning, there's a good list on the first black screen resizable bars one but I usually post them when I first join a thread so there's more details later when I started noticing the WHEA errors.

If you have any specific questions about my hardware after that let me know.

The only way I've been able to reproduce it is to leave it on and wait, It used to happen within several days, but every now and then it'll be fixed for a couple of months.

I've looked in PSU, I've replaced the CPU and ruled out GFX card.

I'm now looking into the RAM situation, AMD 5000 series CPUs only support upto 3200Mhz memory. I'm used to Intel and previous AMD systems just supporting any memory on the QVL for the MOBO, but maybe it's not that simple with Infinity Fabric. It fails like it's a CPU not a RAM issue, I wonder if the CPU acts up with faster RAM.

Many of us have 3600Mhz memory and there's lots of reports on RAM issues with Ryzen. So I'm used to just checking the QVL and enabling XMP and the system being stable but maybe it's more complicated than that.

So I'm going to learn all about what makes RAM different on a Ryzen system and get back to you if I find any interesting results.

 

@Cmdr-ZiN  The thread is fairly new, so I assumed it wasn't common knowledge. To be clear, I wasn't trying to help spread conspiracies or anything. I just wanted to spotlight the PBO and CO as the potential culprits to these issues. I have disabled both and I am now running my "Microsoft Store download test" to see what happens. As I mentioned before, downloading through there has consistently created the WHEA error/restarts. I would love to know if anyone else can recreate the error the same way. Sometimes I crash after 3-5 gigs are downloaded. Other times I can download 100 gigs before a crash. Either way, it's the only time I get the error.

As for the RAM theory. I considered this as well. That's why I brought my RAM speed down to 3200 and FCLK to 1600. I thought that was good enough for a while, but then the same WHEA errors started happening. I haven't tried going below 3000, but at that point I am 100% on the road to a return. Heck, I'm pretty much already out the door. I'm curious though. So I'm giving it one more night of testing.

EDIT: Alright, well that didn't take long. PBO and CO disabled.... still crashed. Same WHEA 18 error. I am completely out of ideas. Hope you guys can figure it out.... eventually.

0 Likes

@authorized_to_ill I didn't mean to imply it's a conspiracy theory, just not to waste time assuming things that aren't definitely going to help.

Precision Boost Overdrive (PBO) is over clocking, if your system is unstable then first bring it back to stock by disabling it. A lot of people confuse Precision Boost the normal CPU boosting with the Precision Boost Overdrive. PBO should be Auto by default which is off unless you have Ryzen Master installed from what I hear.

I can't reproduce the issue you have downloading from the Windows Store but my system might be more stable. I did have my first crash since replacing my CPU after copying over 100Gigs to my Gen 4 NVMe and then returning to idle, maybe data transfer can destabilise the system.

My PC then crashed a couple more times the next day. I haven't had it on since, until today, but I'm still trying to see if I can get it to crash again. If I could reproduce it realiably I would of been able to solve it several months ago.

Nobody who buys a PC should have to deal with this maybe it was too long since I last built an AMD PC and I missed something. Either way I'm determined to eventually get to the bottom of it.

For your RAM make sure you use the ram calculator for safe values otherwise it might become less stable lowering the frequency.

https://youtu.be/KOqhyVNPhaM

@authorized_to_ill 

Was just giving my 2 cents explaining, how I fixed the issue and what was causing the Whea error even with PBO/CO to Auto on my side. I've since then not faced any crash anymore since 3 months with PBO and CO in Auto. Only CPB (default boost) is ON and it is running just fine.  Concerning the RAM, it shouldn't be the issue. I am running 3600Mhz CL14 64GB. 

The solution for me to avoid it to crash after setting it back it to Auto, was to clear CMOS or reinstall the bios firmware. Making sure that when going to PBO tab and when clicking on Advanced again the values shown, would be by default ones and not with values from previous settings that I've set which caused the crashes. 

I can reproduce this easily on my side. Turning ON PBO and setting higher CO then let windows idle or just watching Youtube video it will crash soon or later. Then I go back to bios set back PBO and CO to Auto it will keep crashing for the reason explained. Only when reinstalling bios or force a clear CMOS it will stop with Whea errors. 

Maybe on your case, then it is another root cause. 

@Cmdr-ZiN   No worries. I didn't think you were implying anything, I just wanted to be clear about where my post was coming from.

As for PBO, I've left it on Auto and I crash. I disable it and I crash. I leave my bios on stock settings I simply crash. I don't see a solution to my particular issue so I am going the refund route. After a decade away I was looking forward to trying AMD again. Oh well.

@kodo28   I appreciate the suggestions either way. I'm glad you got yours figured. My issue could be the result of the specific combination of parts. I don't know.

0 Likes

@authorized_to_ill  You mentioned above you said you were running BiOS 3401.  According to ASUS's web site, they're already up to 3801.  Try updating your BIOS.

Never mind, saw a later post saying you updated to the latest one.

0 Likes

All I want to say is a big thank you to the people who mentioned the change of the Power Supply Idle Control in the BIOS, from Auto to Typical Current Idle.

I recently built a new system with the 5950x and it kept crashing randomly, I could hardly complete a Windows installation on the new system:

  • AMD Ryzen™ 9 5950X
  • 64GB (4x16GB) Corsair DDR4 Dominator Platinum RGB, PC4-25600 (3200), Non-ECC Unbuffered, CAS 16, XMP 2.0, 1.35V
  • ASUS ROG Crosshair VIII Dark Hero, AMD X570, AM4, DDR4,
  • 850W ASUS ROG STRIX 850G, Full Modular, 80PLUS Gold, SLI/CrossFire, Single Rail, 70A, 135mm Fan, PSU 70A0X

 I tried the 3801, it made no difference…the only thing that made the difference was the PSU Idle control option.

Everything is new including the power supply, but it seems it could not deal with the low voltage. Now is there an issue with the motherboard, PSU or CPU around this problem? I have no idea... all I know is that since I made that change my system never crashed for this or any other reason.

Great system btw, although I lost come of my hair during the long installation night.

0 Likes

@authorized_to_ill I can understand you wanting to return it, Intel has gotten better than when I bought AMD in November last year.

If you kept trying to solve it you could be doing so for several months and no closer to solving it like me.

I believe there's a solution, otherwise everyone would have this issue and all the news would know about it. I know many that are happy with their Ryzen systems.

I could return mine but curiosity won't let me, I want to get to the bottom of it.

@Cmdr-ZiN  Good luck, man. Keep us posted. I may be moving way from AMD for the moment but I'm still going to keep my eye on this situation.