Hello. 2 days ago I've started encounter problems with my system. Randomly my PC can restart if I left it turned on overnight or it shows WHEA 18 or 19 error in Windows Event Viewer. WHEA errors can occur during stress tests or when I playing games. Sometimes games can crash and this is might be related with this issue too. Before these WHEA errors my system could pass OCCT stress test, AIDA64 stress test and Memtest86 without any errors. CPU is in default settings (memory is 3600Mhz CL16). Everything else is also in default. I suspect this is might be a faulty CPU.
CPU, motherboard and memory has been purchased 3 months ago but retailer and online store where I bought these components are not available due the bankruptcy during COVID-19 lockdown and RMA or malfunction testing by local retailer are not available for me. Any ideas if I can solve this issue somehow else?
EDIT: I forgot to mention - first time I encounter a problem with my system its started weird behaviour randomly in idle, usb devices started disconnect and connect every few seconds. Mouse, keyboard, DAC (even in BIOS, same behaviour present in BIOS too, and in Windows, even after restart. These symptoms went away only after I switched button on power supply to turn off the system). It's never happened again but after that I have WHEA errors and stability issues.
System specs:
AMD Ryzen 7 3700X
ASUS ROG STRIX B550-E GAMING
G.Skill F-4000C19-8GTZKW (2x8GB)
Gigabyte GTX 1660 Super Gaming OC
EVGA SuperNOVA 1000 P2
Windows 10 Home x64 20h2
Example of WHEA error:
A corrected hardware error has occurred.
Reported by component: Processor Core
Error Source: Unknown Error Source
Error Type: No Error
Processor APIC ID: 0
or
A corrected hardware error has occurred.
Reported by component: Processor Core
Error Source: Unknown Error Source
Error Type: Bus/Interconnect Error
Processor APIC ID: 0
Make sure you have the latest BIOS, there's a new one for that board. Use Thaiphoon Burner to find out the type of IC's used on the RAM like :Samsung b-die, Hynix AFR, Micron Rev. e, etc. Then use that info in DRAM Calc to find all the "safe" presets for your RAM at 3600Mhz. Enter those in your BIOS manually, setting the FCLK to 1800 if using 3600 RAM is ideal too. Manually set SOC to 1.10 for stability. XMP/DOCP settings are too high for your CPU, I'm sure you figured that part out so far.
I'd start there with fixing the RAM timings totally, the defaults could be way off. Run the membench test on DRAM Calc after setting the "safe" preset to see if you get any errors, hopefully not. Alternatively, the Thaiphoon program will list all the sub-timings for your RAM off the EEPROM, you should find all the CL 16 settings there too. Anyway, here's links to both programs and example screen shots of what you want to see when you use them. either set of RAM timing should work fine, DRAM gives you slightly tighter settings for better performance.
Download DRAM Calculator for Ryzen v1.7.3 (guru3d.com)
Thaiphoon Burner - Official Support Website (softnology.biz) choose the free version
It'll take your effort to do this but it should result in that error going away and an increase in performance. Definitely update the BIOS, very important if you haven't gotten the latest one over the weekend.
My RAM timings were set by Ryzen DRAM Calculator and as I said system worked fine for 2 months with those settings. No kernel-power restarts or WHEA errors. System pass OCCT, AIDA64 and Memtest86 with no issues before this mess.
Mobo has been flashed with latest BIOS a month ago. Also I've tried rollback to previous BIOS one but results are the same.
I forgot to mention - first time I encounter a problem with my system its started weird behaviour randomly in idle, usb devices started disconnect and connect every few seconds. Mouse, keyboard, DAC (even in BIOS, same behaviour present in BIOS too, and in Windows, even after restart. These symptoms went away only after I switched button on power supply to turn off the system). It's never happened again but after that I have WHEA errors and stability issues.
You did not say you had used DRAM Calc in your initial complaint. I don't work for AMD, no one here does, well a few and they are clearly marked "staff". The rest of us do this for free in our spare time.
If you want someone to say that your CPU is defective, this is the wrong place. We can't issue an RMA, even staff can't. You need to go through regular AMD support for that.
That said, there's a new BIOS out for your board as of this past weekend. Try it. Recheck your RAM and FCLK settings because everything you complain about relates to an issue with something there. Run TM5 or HCi memory tests and see if you have errors. It's not the first time a stick goes bad.
Typically heat, RAM settings, FCLK too high cause this issue. If FCLK is set to "auto" it may be creeping over the 1:1 ratio, which for RAM at 3600 is 1800. Your RAM goes to 4000 meaning the FCLK could go to 2000, way over what a 3000 can do. Some can't even do 1866 or 1900. So hard setting that value might help. SOC too low or not set to a solid number like 1.10-1.15v can cause instability.
There's no "book" of answers here. You are the builder/user, you are in total control of how the diagnostics go or don't go. If you're unwilling to try a flow chart of tests, even if it sounds like a waste of time, then you find out nothing.
I had a 3600x running for months at 3733/1866 and one day a flood of WHEA 18, crashing ,rebooting, games crashing. Well it turned out with the current BIOS, I wasn't stable suddenly. I backed down to 3600/1800 and it all went away. 3 months later a new BIOS came out and I tried it again, this time I was stable at 3733/1866 because this update allowed for the 3000 series to go as high as 1900 FCLK. You can understand now, maybe, how things can run perfectly and just go to crap in a day?
FYI, ignore the Kernel Power 41's, that's due to an improper shutdown. The WHEA 18 is the reason. 19 won't cause a BSOD usually, but to high of an FCLK might. It's generally a PITA to have RAM rated higher than what you need and then down clock, because sometimes the higher speed RAM isn't binned as well as a lower speed like say 3200-3600 is binned. Thaiphoon is a good one to use to find out what binning you have and what other speeds/settings your RAM supports via JDEC. It's how I found that my Team 3733 was actually 4000 in 2017 and those settings work but my FCLK tosses WHEA 19's if I go there. So I backed off to 3933/1966.
It's a dance and seemingly never ending game to get an AMD anything stable. I'm stable now but in 10 minutes maybe not. In which case, I'm not going to get "screaming mad" and toss the PC through a wall. I'd take a deep breath and start the same tests that got me stable in first place to find out what went sideways or if I need to try a lower setting or replace the RAM, whatever. It's electronic, stuff fails.
Something with the CPU, I guess. I set higher timings for RAM, set CPU multiplier at x40 and run OCCT overnight. For almost 7 hours - no errors. I thought it might be RAM or timings issue since system pass OCCT this time now but no. I changed CPU multiplier back to auto with auto voltage and instantly got WHEA error in OCCT with the same "safe" RAM settings.
WHEA 18 is a BSOD every time. If you're getting 19 that's FCLK or RAM not right. FCLK needs to be manually set to half your RAM speed, "auto" has caused issues when manually setting the RAM. Also try Hci for a memory test or TM5, OCCT has caused errors because it's in Beta half the time and is a compilation of many tests you can run separately. It uses HWiNFO and that alone has caused issues with AMD stuff. DRAM Calc's membench is basically the same as Hci except it runs in Windows, so set it for 360%, max out the amount of RAM to test and let it rip.
If you ran a fixed clock and CPU voltage and got no errors but then did by changing to "auto", it doesn't mean a bad CPU. You should be able to run a ~CL16 type setup at 1.4v or so at 3600/1800 no problems with the CPU set on default boost, no PBO. It's tough to say because things like CPU temp come into play, what you have CCD, VDDP, IOD, and SOC (1.10-1.15) set to have a part, there's a new BIOS as of 2 or 3 days ago.
Windows Update has messed up a few times on me causing a WHEA 18 at one point, so it could be OS related as well. Is PBO set with an auto OC? If so, turn the auto OC down or off. Try it without PBO enabled. Check you temps with HWiNFO, the newest is 7.00-4400 Beta and has no issues with AMD at this time. Also look at the voltages since on auto it tosses errors. Your voltage could be too high or low. Use LLC to stop fluctuations, level 3-4 should work. In BIOS you can disable DF states, leave C-states on. Certain power plans can cause this, set the Ryzen High Performance plan up, it's part of the new chipset driver for 3000 series CPU's.
There's a lot of factors on your setup to look at, especially when down clocking RAM. It's the only "abnormal" part of this setup so far but not unusual. I had "safe" timings for months go bad and a BIOS update fixed it, so if you didn't, update to the newest BIOS.
Me nor any program will be able tell you exactly what went wrong in your case, because I'm not there doing the testing. I gave a few suggestions before, like setting FCLK manually and SOC to at least 1.10, you're gonna need to play with some things, read the Event Viewer to find the solution.
If you're dead set on the CPU being bad, request an RMA via AMD support. I can assure you that won't fix this. Determination will fix this along with a lot of thinking, testing, trying some things, rebooting, etc. It's a process with any custom build. I wish I had a blanket answer but I don't, just ideas to try.
Idk if this will help you but I've been having the same random restart issues (slightly different from your issues) but I had tried everything bought a new psu, mobo, and ran all kinds of tests. Found out today my cpu ended up being fried. It started with randomly restarting after Gaming and when I was done sitting at the desktop (idle) it restarted. I didn't think much of it then but it continued this behavior for 5 days. I then took my pc to a friend's and ran all the tests I could. I was at my wits end then we finally tested my cpu and it was the issue so I hope this helps (pray for your cpu tho as it can be expensive) if you end up needing a replacement amazon had a deal for the ryzen 7 5800x yesterday 375$ with tax if from the united states. Good luck!!