for like a year+ already my pc build seems to be very unstable, meaning it randomly shuts down(no BSOD) and boots again. Around 7 month ago I thought I had fixed the issue by adding some vcore offset and setting some load line calibration settings (I dont remember the exact values). Anyway, now I bit the bullet and did a fresh windows install, installed the latest BIOS and GPU drivers and the pc would still randomly reboot but would consistently reboot when I did the power test with OCCT. 7 month ago I suspected like many others that it is a power supply issue so I bought a decent power supply and installed it but the PC would still reboot when stress testing the power (note: 3d mark, and stress testing individual components works with no issues). Now if it only rebooted during a power stress test in OCCT, it wouldnt be a big deal but I noticed the PC reboots when I use a render software called iray which I use in substance painter. I dont know much about OC or bios settings in general but this morning i tinkered and did following:
precision boost overdrive disabled
disabled global cstates
vcore offset 0.0175
Loadline calibration to high
Now doing the stress test as well as using the render software works but the general temps are a bit higher (as of writing, only firefox open: 48c)
I really dont know what else to do except for waiting for it to become unstable again. I really dont know what is wrong, I already tried alot of 'fixes' mentioned above. I am even unsure whether I did the right things especially regarding vcore offset. But from what I gathered it almost exclusively seems to be an amd cpu issue... Again the stress test works now and so does the renderer but I also thought I had fixed it before and then it broke again (not sure how) but I am really wondering what the culprit is so I can do some troubleshooting. I dont have spare parts like other ram or gpu. The power supply is new.
MOBO: Gigabyte X570 AUROS PRO rev 1.0
32 GB RAM
edit: as of speaking the pc rebooted again with the bios settings I hoped would be stable... Like mentioned its extremely incosistent. Also temps are guaranteed not the problem. They never extend beyond 75c under high load.
I have looked and looked and noticed how nobody from amd is speaking about this and nobody has a clue. Way to go, gave amd one chance and they hooked me up with some cheap broken chip passed through their quality control
Have you tried running the PC with no overclocks, just default settings and default BIOS settings?
If the reboot is caused by RAM instability you will get a BSOD 99% of the time, if you're seeing an instant reboot instead, there are a few possibilities:
1. PSU is tripping up, if you can run CPU and GPU stress tests in separate but not together, your PSU might have an issue.
2. VRM overheating, check the temps using HWInfo.
It's quite possible your RAM is not completely stable, or an issue with BIOS.
You can also check the event viewer to see what the reboots are about.
Everything was tried on stock settings, bios optimized defaults and PBO was also disabled. Still happens.
as for the PSU i had read about it being a potential problem but I went ahead and bought a new one with 750W and it still happens. About Ram instability, how do I tinker to make it stable. I never oc, I never tweak anything in Bios. For me having to go to BIOS and having to try out settings in first place is a product failure but I am willing to try. I dont have spare RAM modules though...
I also want to mention though, 6 month ago when that happened and I got BSOD: critical structure corruption. That doesnt appear anymore but just reboots
and correct, it just reboots, no bsod
just also when I cant properly RMA it anymore.
You've eliminated that it's not the PSU, you said it's not CPU overheating... so let's try this:
Try to play with the RAM sticks, if you have 4 then just use 2 (or 1 if you have 2) and see what happens
Also, if you haven't done already, update the BIOS X570 AORUS PRO (rev. 1.0) Support | Motherboard - GIGABYTE Global and chipset drivers X570 Drivers & Support | AMD
Furthermore, within Control Panel/System/Advanced System Settings/Startup and Recovery, under System Failure, UNCHECK 'automatically restart'
I have installed the latest chipset driver + BIOS and the issue still persists. I have made this post AFTER doing all those things. Regarding testing individual Ram sticks, yes that also has been done and the issue persists. Again, PSU is a decent one pulling 750W.
The weird thing is setting ryzen master on auto oc makes the system stable for some reason meaning it doesnt reboot on the power test or when using a rendering software... Like I said its so weird and as someone who gave amd a stab for the first time, definitely feel a bit gutted..
When I view the event viewer, i see a bunch of 'cache hierachy errors'. Maybe that helps.
What's the full error, for example does it say "WHEA: Cache Hierarchy Error Processor Core"?
Sometimes the crash happens when the motherboard is not applying the required core voltage to your processor
Ryzen Master's auto OC possibly fixes it because it increases the core voltage, this can be done in BIOS as well
Maybe you can also manually lower the CPU multiplier with the BIOS
If you are doing an OC to your processor, you must apply the stable CPU core voltage value according to the VRM quality of your processor and motherboard. If you are not doing, you should also apply it because your system is not stable with the factory settings
Okay, small update for everyone reading this in future, or now. I bit the bullet, and refuse to play around any more. The system runs stable with AMD overclocking enabled and Loadline calibration on high + adding a vcore offset of +0.075... This however seems a temporary solution and obviously adds alot of heat; if I didnt have a crazy custom cooler, it would be horrible. I got myself a new rig, this time I picked Intel and other parts and it runs buttery smooth. I will be selling the old and just leave with a horrible taste with amd and gigabyte. In future I will be avoiding those brands for me and my company. The minute I have to tweak bios, its a failed product. And it seems though AMD has sent out tons of faulty chips as I see lots of people with similar issues.
I had encountered the same random BSODs with that WHEA error for a system I built for a buddy of mine (Gigabyte X570 AORUS Master and Ryzen 9 3900XT); I was so embarrassed when I found out that he was letting his wife use it and that she was experiencing the copious amount of random restarts throughout the course of the day that I felt that I had let them down with the build. How I ended up solving it (when the issues wouldn't go away even after performing BIOS update, Windows update, video/wifi/bluetooth driver updates) was by swapping the chip to Ryzen 5000 series (Zen 3). In retrospect, I ultimately think it had to do with me forgetting to install the AMD Chipset drivers. I suggest you try installing the latest ones from here: https://www.amd.com/en/support/chipsets/amd-socket-am4/x570 and seeing if you still get the WHEA BSODs at stock speeds/voltages. My personal guess is that Zen 2 based chips may need the Ryzen power plan support from the AMD chipset driver package to run correctly without BSODS on later versions of Windows 10/11.
I had done that already...
You also mentioned the dilemma. You and me we probably knew where to look and tweak but people who perhaps are new to pcs or just wanna do simple tasks, you cant expect them to go into bios and tweak around the cryptic values. Amd should have communicated with other manufacturers and also should have triple and double checked their own chips before they sold them