cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

eliotcole
Adept II

Re: Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-P

Jump to solution

I'm going to make two post, apologies in advance. This post was actually from the 10th, the next will be me catching up with y'all.

----------

This was a long post but it's not letting me post, so if AMD ever do look at these forums, that's the reason for the multiple posts.

Anyone reading ... you can skip to your reply if you don't really care that much (I reply email style, always have, always will) ... but mostly everything I'm writing is general musings.

Most importantly, though.

To Everyone In The Thread - CAN YOU REPRODUCE THE FAILURE?

It's going to be really helpful if y'all can isolate a specific thing (or things) that causes the crashes here. Because if these start to overlap, then anyone that does eventually try to do something *real* about this will be able to take that and run with it, diving into all the code and components that are involved.

For me, this is quite reproducable:

  • Time-Spy - Resolution 1440p, Rendering at 4K, 60FPS, DX12 (and 11) - It happens as the avatar/thief turns away from the giant back into the museum. The only time anything reaches high utilisation/temps during this is at the very start, then it's buttery, otherwise. No spikes on anything at time of crash.
  • Total War: Warhammer II - Highest everything, GPU memory unlimited/not-unlimited, various other settings - It happens directly after a successful siege battle with Grom in one of the immediate territories. When the post battle screen is on, and you click to continue after selecting your bounties, boom, power down. Yes, I'm aware this game engine is one of the worst optimised pieces of crap out there. I'm also aware that it's one of the best RTS ever made (I've even said so!), plus the fact that it's badly optimised makes it a good test.

I'll happily try any other tests that others have confirmed cause the issue, although ZiN's recent relapse has be doing permanent wearyface.:-(




ZiN (and everyone, tbf), I'm aware that your issue has come back, and I'll be coming back to that eventually, but thanks for the reply, none-the-less.

My PC is back now from SCAN (they've been great, if a bit uncommunicative) ... and what I'll be doing is:

  1. Putting all the drives except the spinner in the PC
  2. Not connecting any extraneous devices apart from the monitor. What I can remember that's:
    • The HDMI to AVR (4K res) that I mostly use for the surround outs to the Razer headset.
    • The USB 3 switch (unpowered).
    • External drives.
    • Second USB 3 switch (powered).
  3. Booting up with the new BIOS that (yes, I could've done it) SCAN put on and reported zero issues apparently testing at the specs I run on.
  4. Sighing that something.
  5. Testing immediately.
  6. I expect it to fail, which would indicate (for me) driver/software conflict. If it doesn't, then great ... if it does ...
  7. I'll have to reluctantly upgrade this (my main) install of Windows and try again.

I'll then install linux, and test there.


@Cmdr-ZiNwrote:

 for some reason I don't get notified of thread updates anymore, but I did see the mention thanks.

... Good luck I hope you solve it.




 

Cheers for this, Anzu.


@Anzu34wrote:

 
- CPU : AMD Ryzen 5900x 
- MB : MSI B550 Tomahawk
- GPU : MSI Geforce RTX 3080 Ti Trio Gaming
- AIO : NZXT Kaken x73 360mm RGB
- RAM : Crucial Ballistix 2x16 Go 3600Mhz Cas16
- PSU : Seasonic Prime PX-850, 850W Plus Platinum 




 

ace50k, this is, of course, the exact same PSU that I'm using, and the one that SCAN profess to have worked absolutely fine. I just don't understand why you should need to disable anything if you're running at stock.


@ace50kwrote:

It turns out the PSU was not able to handle the system properly as I swapped it out for an old 2008 (!) Corsair HX1000 I had in a spare PC and the system has been rock-solid since.  I've disabled PBO/Curve Optimizer for now and am running Cinebench's stability test as that used to cause WHEA errors/BSODs before.




 

kneel420's comment had me wondering if house electricals has anything to do with this?

Becuase one thing that I know is that (I'm aware how bad this is) the grounding in the office I'm plugged into isn't amazing. I do feel for you, though. I'm X570, too, a lot of responses in thread have listed full (as poss) specs, in the long run it could really help, but it won't mean much to most of us.


@kneel420wrote:

I'd be surprised if it was anyones GPU as people with both AMD and Nvidia GPUs had this issue.  Also someone fixed it with a new PSU?   I am at a loss with these reboots and its pretty frustrating.  Going to try the curve thing, then will try to replace the 3080 TI (good luck, they prob wont find one) then will try to replace the CPU.  I honestly thought it was the gigabyute X570 Aorus Master motherboard but peoiple in threads saying they had it on multiple boards including MSI and Asus.  Others claimed they RMA'd their 5950X's with AMD 3x and all of them still had the issue.   Not sure which to believe anymore!  Meanwhiile my PC is liable to reboot at anytime and these sys crashes are NOT graceful.




 

ZiN, I would pontificate around whether it's to do with how the motherboards handle the power, here. All the BIOS updates in the world won't mean a thing if the actual problem is specific to the manufacturer's processes (and that is happening cross company). However it is unlikely, because we have seen this reported across various AMD platforms, on multiple manufacturers, and the only thing that has been gradually improving things are BIOS updates.

Might be time for us to start a couple of threads at ASUS/MSI/etc (and maybe memory), just to ensure that there's parity there, to ensure that they even *know* this is a thing.

I don't believe for a second that we're just a few isolated cases on the internet, I believe most people will just either ignore it or never trigger whatever is doing it.


@Cmdr-ZiNwrote:

My issue has returned.

No issues for a couple of months after replacing my 5800X with a new one.

However out of the blue it crashed again on the 8th and again on the 10th.

I'm a bit tired of this now, I've never had this much issue with a platform ever before.

PSU was fine in my old system that drew more power, however maybe the new system needs something more sensitive. However there's no recommendations or specifications saying so.

The issues is most likely something to do with the MOBO, CPU, Gen4 NVME or 3600 MHz ram which I got at the same time.

The ram is no longer on the QVL list so maybe that's it but I would not of thought that would cause power cuts. The PC power light and RGB remain on but everything else power cuts which is strange.

Strange how this issues seems to follow infinity fabric. It's just so hard to narrow down.

Maybe the issue is just super random and rare but affecting all infinity fabric systems to some degree.



koguma/ZiN: I think raising it with the mem folks is a good (as above) idea, however, ensure that in your core, first, post, you reference this (long as heck) thread, so the context is clear. Also, as previously mentioned a lot ... including specs, triggers, and references to others with the same incident seen can increase the likelihood of it being taken more seriously.

Also, I spoke to SCAN about QVL specifically, and they stated that the RAM chosen for my setup (in a previous post) was QVL, however I wasn't totally sure.

One thing that I'm definitely thinking could be a wild shot in the dark is alerting one of the tech players on YouTube about this. They likely won't respond, but in the rare chance that they do, perhaps LTT, HWU, Gamers Nexus, and the grey haired bloke, might be able to 'raise awareness' ... or at least get us a specially coloured ribbon to wear.


@kogumawrote:

I've been ok for nearing the past two weeks by setting my Ram speed down to 3333 via the "Try It" MSI bios option.   Most likely it's an issue of really tight timings and/or specific timings.   My ram's XMP was specifically for Intel (I bought Intel certified ram, not AMD) so XMP always fails.  That means I can't run my ram at the specified 3600 speed.  I should probably reach out to G.Skill and ask them about this...



@Cmdr-ZiNwrote:

I'd love to hear what they say. I checked the QVL before I bought but maybe I messed up and bought the wrong stick or maybe they removed them from the list.

Still you'd think it wouldn't be that difficult, a quality stick of ram should be on the QVL list.


 

eliotcole
Adept II

Re: Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-P

Jump to solution

I would say, though, that the NVidia user experience is many multiples worse than this Radeon software that I'm now seeing on this card. Added to which there's quality of life stuff that I've always had issues with Nvidia (especially on sound) that I just really needed to switch. I'm happier now (not compensating ), but I don't want this to be a system breaker, y'know?


@Soulsa wrote:

It sucks, I'm over the moon AMD are doing fairly well BUT whenever I have had Intel they always just worked....NVIDIA also.... My 5700xt has had DRIVER problems for over 6 months, AMD KNOW also and we bombard them about it all.

 

I can't help but think it's always the driver team letting down the products...




I wonder if it's then down to differences in the CPU manufacturing. There could be very subtle differences, and these BIOS updates are tuning the system for the ones that they know are getting a pernickety. They're just addressing each manufacturing line one by one, hoping to eventually cover everone?


@Cmdr-ZiN wrote:

@Soulsa It's all been covered many times. It works for some, not everyone. However it's a work around not a fix.

Ryzen with infinity cache just can be unstable with various combination of components and narrowing them down is difficult because for most of the time it's stable.

I heard a story of 2 friends they built identical systems, one had the issue the other didn't. They swap CPU's with each other to see if the issue followed the CPU. However after they swapped CPU's, neither of them had the issue again.

I also heard Ryzen 3800X's had the issue and it was eventually fixed for most by a BIOS update.

For others it could be fixed by a CPU, GPU, PSU, MOBO or RAM swap, I've even heard of a sound card causing it.

I've thought I've had it fixed multiple times only to have it reappear months later. I still want to get to the bottom of it, however in all the systems I've ever owned, both AMD and Intel, I've never had an issue like this.




Mate, I would recommend highly, if you can, listing your PC specs, and if possible, whether you have found a way to capably replicate the incident. You don't have to, of course, but if this ever *does* become a thing, your input could really help.


@Soulsa wrote:

I was getting the same errors and had always thought it was GPU problems until I came across a similar story to mine from black screen restart while watching Youtube....

 

I don't know if you have tried these settings as this is a now 85 page forum post but could be low load restart issue.

As quoted.....
1. Disable C-stage power
2. Set PCIE to 3.0
3. Power supply set to Typical Idle.
SOC and Vcore all auto, XMP enable at default for you ram, CPB and PBO enable.

I don't have PBO on though myself, the power ramp for the minor speed rewarded isn't worth the hassle imo


 

ace50k
Adept II

Re: Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-P

Jump to solution

I should have clarified that apart from enabling DOCP to set the memory to 3600, all other settings have been left to their defaults.  Before swapping the PSU the system could occasionally BSOD when at the Desktop though most commonly it occurred during gaming with the "Cache Hierarchy" error in the event log.  This was when I was using the GTX 1080 card and I had to manually set +1 in the Curve Optimizer to stabilise the system.

After swapping to an RTX 3090, I started getting a blank screen when gaming and the system rebooted but would not proceed with booting Windows (there was a white LED constantly lit).  The Curve Optimizer was still set at +1 for all cores.  After swapping the PSU out, I no longer got the black screen/reboot and I decided to set all CPU options back to default so that single-core boost would peak properly.

Since the last time I've posted here, I've played a good 10+ hours of games including AC Valhalla and Control with all settings maxed out and the system has been rock-solid.  I can only think that my EVGA PSU had developed a stability issue in circumstances with rapidly changing loads.  It seems to work fine with my spare system that now has the old GTX 1080 and a Ryzen 3900X so perhaps the 5xxx series of processors are more sensitive to the PSU, at least in my case?

 

Optimalnz
Adept I

Re: Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-P

Jump to solution

5600x, Have been battling with whea incorrectable error for 3months, after i RMA I will sell the 5600x an the b550 mb an go back to Intel.

Not a word from amd, quick to take our money 

Re: Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-P

Jump to solution

I'm not entirely certain my issue is the same as the problems found here, but I am getting WHEA 18 critical Kernel Power errors, so I'm going to share my experience in case it helps find a solution.

My specs:

Ryzen 7 5800x
EK AIO 240
Asus Crosshair VIII Dark Hero
G. Skill Trident Z Neo 16GBx2 3600 CL16-16-16-36
Asus ROG Strix 3070
Samsung 970 Evo Plus 1TB
Seasonic Focus GX-850

This system was put together a couple of weeks ago and everything seemed solid. Cinebench, Superposition, Heaven and games like Doom Eternal and Gears 5 worked great. Skip to a few days ago where I tried to download another game from the Microsoft Store. In the early stages of the download the system just restarts. Event viewer shows the following:

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Bus/Interconnect Error
Processor APIC ID: 0

I had played with some PBO and CO settings in attempt to lower temps. So I set those back to auto and tried to download again. Download lasted longer but the system ultimately restarted with the same WHEA error. Only other thing left to change in bios was the DOCP. I lowered it to 3200mhz and FCLK to 1600 (down from 3600 and 1800). I am now able to complete downloads for a couple of games and have not seen the error with these settings... yet.

The download thing is odd. However, when I attempt to download through the MS store the WHEA errors consistently appear and force a restart when DOCP is set to 3800 and FCLK to 1800. Everything else is set to their default/auto settings.

Is this an issue with my specific RAM, or does it relate to the CPU? Isn't FLCK tied to CPU voltage in some way?

Almost forgot to mention, my Asus bios version is 3401. I believe it has AGESA 1.2.0.1.

EDIT: Scratch that. I am no longer stable at 3200 and 1600. Just experienced another WHEA 18 while attempting another download.

kodo28
Adept III

Re: Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-P

Jump to solution

@authorized_to_ill 

Something I also found on my side is that when windows throw WHEA errors due PBO/CO even after changing values back to AUTO, the system would keeps crashing with WHEA errors which was not before.

The only solution to avoid it to crash after, setting it back it to Auto, is to clear CMOS or reinstall the bios firmware to be sure. Not sure why this but, it is like if the PBO/CO setting are getting stuck on older values after crash.  (Simple test is, once the settings are set back to Auto and if you go to change it again to Advanced, all the older values that were set and which raised the WHEA errors will appear back on the advance settings. After doing a full reset of bios, this doesn't happen and shows the all fields like if it was the first time activation and you need to populate them.) 

I've tested this several times on my gigabyte X570 xtreme and all time, I got WHEA errors and set it back to AUTO and reinstall bios firmware it fixed it. You may try it on your side. 

Meantime for you WHEA error, you are maybe to high with the CO value on Core 0. 

Re: Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-P

Jump to solution

@kodo28 

I appreciate the suggestion, but my bios has been properly resetting the values. I have set to auto, manually set them to 0, hit F5, and I even just flashed the latest Asus bios. The issue persists. In fact, the latest bios update seems to have made it worse. 

0 Likes
Gwillakers
Challenger

Re: Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-P

Jump to solution

Ok, I guess I've got to get my two cents in:

We can get the high temps under control. That is the easy part.

This is what I tend to advise most people,   The parameters pertain to the 105W TDP processors 5900x and 5950x

I would set VSoc from Auto to Normal

Then add a .006v positive differential to VSoc

Secondarily, I would set VCore from Auto to Normal

Then add a .006v positive differential to VCore

The above boosts Voltage by the smallest amounts(which your processor needs)

The amount is less than what the CPU would boost when it changes frequencies.

A very slight boost of voltage for VCore and VSoc  I reccommend for most crashes.   (A little goes a long way)

The above should keep you stable at the lower frequencies but also boosted voltage at the higher frequencies

Try this, this should help, and if it doesn't totally cure crashes at idle then, use .012 for VSoc and Vcore differential.    (still a VERY minor increment)

 

To get those higher frequencies under control I suggest:

Turn Core Performance boost Enabled

Turn PBO from Auto to Advanced

Set PBO Limits to Manual

Turn PPT from 142w down to something like 120w

When the processor sees that it is approaching the new lower power limit, it would be discouraged from boosting to 

an even higher frequency.    Keep TDC at 95A,  and Keep EDC at 140A.

Set "Platform Thermal Throttle Limit to Manual.    Then, the next field that pops up set "Platform Thermal Throttle limit

down from 90 to 75  or whatever temperature in Celsius that you feel comfortable with

*****

Yes Core performance boost can make the system run very hot.  Not so much PBO.   If you turn them off you lose lots of performance.    But Enabling them and  letting them run, with a more limited Power (you don't really want to hammer that CPU do you?) and with a tighter limit on temperature.    You get much of the same boost as before, even sometimes greater, but without the negative stuff.    Try it. I think you will like it.

 

0 Likes

Re: Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-P

Jump to solution

Let me know if anyone wants to try all that and see if it works. I've never in my life spent as much time in a bios screen as I have in the past week. I expect this cpu and it's components to run stable at stock settings. If that's too much to ask then I am going to take advantage of my return window and switch back to Intel. I'll take the hit to performance and efficiency if it means I don't have to deal with this error anymore.

0 Likes

Re: Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-P

Jump to solution

I just found this discussion: https://community.amd.com/t5/processors/ryzen-5000-crashes-whea-errors-will-get-a-quot-silent-fix-qu... 

If that's the case then I am definitely just going to return and get a refund. This is ridiculous.

0 Likes