Greetings. Been having an issue with a new build and am close to pulling my hair out.
With the components listed below, I have been having severe stability issues if I load the CPU and GPU 100% at the same time. It can be reproduced very consistently running Furmark at 1280x720 in a window, and 7-Zip’s benchmark running with all 24 threads loaded (or at least 22.)
The system will abruptly (usually within a few seconds of starting the 7-zip benchmark) shut down completely. The motherboard still has power (the on-board power button LED remains lit—NOT the front-panel LED,) however the system cannot be powered back on or reset, without cycling the power supply switch.
Most interestingly, it seems that the problem manifests itself not based on the actual clock speed or even number of cores or threads, but SPECIFICALLY if the CPU is “100%” loaded or not. In an attempt to determine if it was a power draw issue, I tried turning SMT off (running only 12 cores,) turning a CCD off (running only 6 cores,) setting the clock speed to a static 3.7GHz, and keeping Precision Boost Overdrive off the whole while.
The system will remain completely stable so long as the CPU usage does not touch 100% (while the GPU is also loaded 100%.) CPU temperature never exceeds 80C, and GPU temperature stays around 69C (after letting it soak a bit with a lower total load.) Often times, the very second I try to change 7-zip to benchmarking 24 threads (up from a previous lower number,) the shutdown will occur immediately.
I also tried turning the target TDP down on the graphics card by about 15%, and a crash would still occur when the CPU was loaded 100%. Finally, I removed half of the DDR4 DIMMs, as well as trying the system with A-XMP turned off. No help.
I have updated the BIOS to MSI’s current release (I think I was getting WHEA errors on the second-most current release, too, but that seems to have subsided for now.) Also tried changing the GPU driver to a 457.xx release, instead of the current 460.xx (the system the GPU was in worked perfectly fine with the 457.xx driver.)
The only items I’m considering at this point would be a BIOS-related issue, a bad motherboard, a bad CPU, or a dying PSU. Has anyone else experienced something similar?
Below are the specs of the system:
Ryzen 9 5900X CPU
MSI X570 Creation motherboard (with current BIOS)
Windows 10 20H2 (fresh install)
Kingston KHX2933C17D4/16G RAM, DDR4-2933, 16GB x 4 (transferred from a stable previous build)
EVGA GeForce RTX 3090 FTW3 graphics card (transferred from a stable previous build)
SeaSonic Prime Ultra 850W PSU (transferred from a stable previous build)
Noctua D15S HSF
sounds to me like a "ungood" BIOS
either use an older BIOS that supports your CPU
1. do a CMOS
2. load default settings
4. flash current bios again
5. shutdown and do a CMOS
6. load default settings
7. test again
if everything works = apply your OC ;)
Thank you for the suggestion, unfortunately the worst of my two current issues still appears to persist (the complete system shutdown under fully-loaded CPU and GPU.)
Interestingly enough, I was fixating so hard on that scenario that I hadn't really tried playing some video game programs to see if I would receive WHEA errors / BSODs still. Turns out I was (typically fewer than five minutes into a game like BeamNG.Drive, something where there is about 20% CPU usage total at worst, and modest graphics card use at best.)
One suggestion noted by some other folks having issues with X570 boards and Zen 3 was to turn on "Game Boost" mode in the BIOS (while leaving PBO off.)
This appears to do two things: it sets a static "overclock" and voltage on the CPU, but it also apparently may have a hand in preventing any of the cores from ever being allowed to be idle. I haven't gone back yet to see if it is "correctly" adjusting any of the other voltages (SOC, etc.) to a more-stable value compared to what the system was running before, but I am now apparently able to play video game programs for extended lengths of time now without encountering any stability issues whatsoever.
The only downside to this is the extra 30 watts of idle power consumption, and the cores are stuck at only 4175 MHz, but at least the machine appears to be stable. It will still shut down under the 100%/100% CPU/GPU load scenario, but at this point I can at least use the system for something more than just web browsing. Singular 100% load (prime95 OR furmark, not at the same time) are completely stable still.
Looks like some new AGESA versions are starting to trickle out to MSI boards (one dropped for an MSI B550 model on the 23rd.) Hopefully they roll this out to the X570 models as well, as this is very frustrating.
mh, it is strange that simultane Prime and Furmark are ok but gaming isnt
wait for the new AGESA and maybe it is fixed than
Prime and Furmark on their own (one or the other) are fine. When I run both at the same time, that's when I'll get a full system power-off. That's what is vexing me most: if I turn off half the cores, the same thing happens when all six cores are loaded. But I have all 12 cores enabled, I can be stressing, for instance, 11 out of 12 of them, and it'll still be fine.
Gaming seems to have been addressed with the "Game Boost" mode temporarily, until [hopefully] a new BIOS addresses that problem (or, ideally, both problems.)
mh, actually i thought its not your PSU - as the SeaSonic isnt bad...
did you enabled XMP?
My thought exactly (or at least, my hope.) My UPS shows the PC (including the monitor, which is maybe 20-30) draws just north of 700 watts under as high a synthetic load as I can possibly generate without it crashing.
As for XMP, I tried that both ways. Would still shut down under full 100% load on the GPU and CPU. At this point I have XMP on and it seems to be fine for standard video game programs (Observer Redux, Metro Exodus, BeamNG.Drive,) during which time the system draws total about 600 watts.
I'd like to rule out the PSU SOMEHOW not delivering enough wattage, but I don't want to bother buying the 1000-watt version if, at this point, the system is at least usable for typical workloads.
I noticed that your RAM is not listed as being compatible in either MSI QVL List for Vermeer or Kingston RAM FINDER (2933 Mhz) for your motherboard: https://www.kingston.com/unitedstates/us/memory/search?model=100028&devicetype=7&mfr=msi&line=mother...
In fact there are no Kingston RAM Modules listed only Kingston's Hyper-X.
first I would reset BIOS back to its factory defaults and put everything back to it defaults.
Then I would try using just one RAM Stick to see if it continues to crash under 100% CPU load.
It is possible that your Motherboard doesn't support or is compatible with 4 DIMM Slots being populated by that specific RAM Memory. So I would first start with one stick and then maximum of 2 Ram sticks to see if it crashes while the CPU is at 100% load.
I recommend you use OCCT CPU Test first with Large and Medium Packet test and then with Small Packet test which is the best to check stability in the CPU. Also run the PSU Test which runs both the CPU and GPU Tests at the same time putting the maximum demand on the PSU.
At the top left corner where OCCT Settings is, put the Global Temperature at 96C. This will be one C over the Maximum Operating temperature of the processor 95C. That way OCCT will stop the test once it reaches 96C.
Keep a close eye on Temperatures and PSU Outputs and Fan speeds during the tests.
If you are sure the CPU crashes when it reaches 100% load you can always adjust the Maximum CPU in Windows Settings - Power plan from 100% to 99% or lower and see if it crashes if the RAM Memory doesn't fix the issue.
If you have done this already then I am sorry and I must of missed it in this thread.
Haven't tried OCCT before. The large dataset CPU test seemed to be fine. Small shot the power consumption up to ludicrous amounts and an extreme temperature nearly immediately (240+ watts and over 100C.) Killed that right quick--I don't think I can really test with the small dataset while running the system in its sort-of-overclocked state, wherein it otherwise currently appears to be quite stable, with the exception of the 100% CPU 100% GPU synthetic load causing a shutdown still (although I guess I shouldn't be calling it "100%" as 7-zip's CPU test, when ran on its own, never peaks the CPU beyond maybe 150 watts when running all 12 cores.)
As for the RAM--and as an aside, I guess I didn't realize Kingston had separated HyperX out as its own brand, or something--it seems to be operating normally at present, again likely related to whatever parameters the motherboard's "Game Mode" and overclock are imposing. For what it's worth, I never took those qualified/compatibility lists as an implication of "If your memory isn't on this list, it will NOT be compatible," but rather just for what it is: guaranteed compatibility, or at least "we tested this, it works with 1, 2, or 4 sticks, as indicated."
Finally, it would be quite unfortunate if Seasonic was truly at fault here. I'm still not entirely convinced, although I do have a thirstier system being powered by it now. The previous build, with an Intel i7 6850K, but with the same exact GPU, never gave me a problem like this.