I've been struggling with this issue for 2 weeks straight after building my pc, i literally tried about 10 different solutions and none worked. Disabling CBS/PBO was the only thing that avoided crashes. Thing is, who wants to spend that much money into a CPU to have it only working at the minimum ?
I had crashes usually when the pcu was coming back to idle after an heavy load.
Below is the fix that worked great for me, hope you guys can replicate it somehow, see below :
You have to have a "Curve optimiser" in your BIOS to do this. It's inside "Precision Boost Overdrive" section, you have to set it to Manual to show the settings and set them.
EDC limit = 200A.
Curve optimizer = +4.
Looks like it works for me. Of course your CPU might need more or less curve. You'd better start with like +4 - +6 and gradually raise it until the problem disappears (it fixed it at +8 for some).
If this works for many people, I can even give a conspiracy theory, explaining this.
Looks like the AMD casino took the silicon lottery to a new level.
The usual gambling used to be - how well you can overclock your CPU, but the base specified performance was guaranteed to you. Not anymore. Now, to make the Ryzen great again, the performance AMD specifies is the performance of an AVERAGE CPU. But of course that doesn't mean AMD is going to put a half of the CPU yield which is below that average down the trash and lose profits. That means a half of the buyers downvolts their CPUs to overclock them (the "awesome" new feature much advertised by AMD), and another half OVERvolts their CPUs to UNDERclock them to make them work somehow. This thread is the home of this second half losers. And, miraculously, these attempts to make this crap work voids the warranty, so AMD doesn't even have to take their crap back. Casinos never lose!
Of course this can be corrected by BIOSes (and will be, when AMD is tired of RMAs) by just raising the default voltages and/or cutting the turboboost (together with the performance).
Also it can be easily explained why the systems mostly BSOD or reboot at idle or some plain low load tasks, and remain stable under burn-in. The problem is not overheating, the problem is inability of a given crappy CPU to work stable at a given frequency with a given voltage. (just the same as if you undervolt it too much). The larger the frequency, the more chance of a BSOD to occur. The fully all-core loaded CPU works at LESSER frequences to stay within the TDP. But when you stop your burn-in and start to watch a video, just one or two cores (pre-heated by the previous burn-in) work, but they work at the MAXIMUM frequences. And - say Hi to a BSOD or reboot.
Thanks for the detailed post, @Anzu34 !
It'd be great to get your specs, in the thread, in case anyone is actually collating that stuff.
Mine has been at the shop for maybe a month now, and they've updated the BIOS to one of the more recent ones (hopefully the latest) and they've seen zero issues (apparently - and I have no reason to distrust them - running my spec of many drives, and timespy at 1440p 60fps but rendering at 4k) whilst stressing it.
So, hopefully AMD have been silently 'fixing' this issue without ever acknowledging that it was ever a thing.
It's frustrating, to be sure, and I *will not* accept a system that doesn't at least operate at stock specs ... which is, frankly, all I want and need from it. Without meaning to sound too assumptive, of course.
What @Cmdr-ZiN says keeps ringing in my head, though ... it doesn't matter how good things can look ... this issue *KEEPS* coming back. So ... hopefully my issues aren't WHEA related, but I dunno.
What I *will* be doing, though, is testing it at an older version of Win 10 than you're all likely running, to see if it is a feature of the issue. When I get this back, my main OS that I run Windows from for gaming I purposefully keep at a previous version of Windows because of very borked functions in recent updates. I still keep security and driver patches updated, of course, where required ... but in addition to the borking there's a number of other stuff I can't stand. For example, Windows keeps jettisonning(?) the security changes I make regarding the various ways it phones home.
Anyway, I dye cress ... because it's nicer in rainbow colours.
I'll report back when the box is back in my hands, mateys.
@eliotcole for some reason I don't get notified of thread updates anymore, but I did see the mention thanks.
My issue has been solved by a CPU replacement for who knows how long now. I'm not even on the latest BIOS as I kept every the same for a month to see if the new CPU worked with no changes. With everything working I didn't feel the need to upgrade the BIOS.
For others it will be a GFX card, or PSU, could also be ram or mobo, these are the likely candidates. You might find the faulty component works fine in another system. We're talking about slight instability. However I can tell you a different CPU can run rock solid at stock. If your CPU doesn't run rock solid at stock, don't try to tweak BIOS settings as you should RMA that thing.
The WHEA error is for the GFX card, but when everything thing is failing at once it's hard to say what caused what, until you test things one at a time. You'll find there's mutliple errors.
I doubt it's a Windows issue, I'm on the latest Windows 10 version without issues and no issues on the previous one or 2 either. Also I believe you did a fresh install.
I really feel like most of these issues are hardware issues, probably different hardware but there is a known issue with Ryzen CPUs. BIOS updates will improve it but if the latest update doesn't fix it, I wouldn't play with BIOS settings I'd just replace it.
I'd rule out GFX card and PSU then just swap the CPU.
Good luck I hope you solve it.
Just an update on my WHEA issues, but the warranty replacement 3080 GPU has so far completely fixed the issue even though they originally appeared to be CPU related in the event log. I'll be back here if it dies again, but just an FYI on where I'm at.
- CPU : AMD Ryzen 5900x
- MB : MSI B550 Tomahawk
- GPU : MSI Geforce RTX 3080 Ti Trio Gaming
- AIO : NZXT Kaken x73 360mm RGB
- RAM : Crucial Ballistix 2x16 Go 3600Mhz Cas16
- PSU : Seasonic Prime PX-850, 850W Plus Platinum
Are you 100% sure the WHEA_UNCORRECTABLE_ERROR comes from the GPU ? I've seen a lot of different answers
What are RMA processing time?
Playing with the BIOS meant doing a little tweak with the curve optimizer for me, nothing too fancy and so far it resolved everything. See my previous post above.
My OC and my GPU are stable, i've run 5 Cinebench R23 in a row. I also have OC my GPU and after 5 Kombustor stresstest still no blue screen or crashes. My CPU at 100% temps around 81-82 max, GPU at 100% around 65-66. I don't see a reason to RMA.
@C64T Thanks for the update, glad you sorted it
@Anzu34 Yeah WHEA-logger error ID 18 is graphics but in my case the graphics system was failing due to the CPU failing, when one thing goes everything goes
Your error is different yours is more general hardware failure, which puts you in the same boat as me and most of us, You just need to identify the offending part and replace it. See this link the advice is reasonable https://www.makeuseof.com/tag/fix-whea-uncorrectable-error-windows-10/amp/
If you're happy with it then that's fine but keep in mind, if it doesn't run stable at stock then something is wrong. A BIOS update might improve it in the future or it may just degrade further when out of warranty, however that's your call.
Seeing as you actually got a bluescreen you might have a slightly different cause, I'd try a different GFX card first, driver reinstall and uninstall of MSI afterburner and reseting all OC setting back to factory. Still can be any of the same things it was for all of us. Good Luck.
@Cmdr-ZiN I'm not sure your issue are solved. I had no problems running a 3700X + a Vega 56. Upgraded to a 5950X and suddenly WHEA 18 errors every few hours. The graphics card was affected to the point where Windows would disable it.
I turned off XMP (I also have 3600 ram) and ran it at 1200. No errors. Tried manual ram speed tuning and the best I could get was the errors went down to one every 2 days. I sold off my Vega 56 (going to upgrade in a few months anyway) and threw in my old R9 390P, and turned XMP back on. I get the WHEA 18 errors now once a week.
I feel like @eliotcole most likely is on the right track. The graphics card is just a red herring.
@koguma After testing and troubleshooting for 6 months and still getting the issue at least once a week. I replaced my CPU, after I ran my PC solid for a month, no issues. I've not had a single issue since I RMA'd my CPU.
I'm pretty sure it's fixed, it's been months now.
It could be the GFX card for some, I'd say it's mostly the CPU, but people will need to narrow it down for themselves.
I managed to resolve the issue temporarily by adjusting all cores by +1 in the Curve Optimizer settings but shortly after upgrading to an RTX 3090 I was getting a different issue in certain games (mostly Control) where the screen went blank and the PC would not reboot (there was a constant white LED on the motherboard), requiring a full power-off.
It turns out the PSU was not able to handle the system properly as I swapped it out for an old 2008 (!) Corsair HX1000 I had in a spare PC and the system has been rock-solid since. I've disabled PBO/Curve Optimizer for now and am running Cinebench's stability test as that used to cause WHEA errors/BSODs before.
I've had that black screen before with the Vega 56. I thought it might be the PSU, but my PSU is 1kw. It can handle pretty much anything AMD can throw at it.