I setup a new PC earlier this year in Feb. I primarily use the PC for coding and casual internet browsing. I bought a Ryzen 7900 so i can run my python script with multi-processing.
Problem: After using the PC for 4 months with no issues, my PC suddenly started crashing frequently - sudden black screen crash follow by CPU fan running at 100%.
It crashes after logo and gets into a bootloop. I can only get into windows after windows recovery is loaded.
It crashes during light browsing after i got into windows.
It crashes immediately at the start of any CPU test (CPUz, occt, etc.)
Ironically, when i run my python script at ~70% high CPU load, the PC will NOT crash for the whole day. But it will crash soon after i stop the script.
I'm using default BIOs setting with the following hardware setup -
AMD Ryzen 9 7900
Asus ROG STRIX B650E-I motherboard
G.Skill Flare X5 Expo 2x16GB DDR5 6000 CL36 (F5-6000J3636F16GX2-FX5)
Palit GTX 1070 JetStream
Cosair SF750
Arctic Liquid Freezer iii
XTIA Xproto-N open casing
Temporary Solution
After testing various underclocking configuration, i managed to stabilize my PC and stop the frequent crashes with the following bios configuration -
Precision Boost Overdrive: Manual
EDC limit: 110
Curve optimizer: negative 30
Reducing EDC from from 150 (Ryzen 7900 stock) to 110 managed to stop the frequent crashes. I tested different EDC limits from 200, 150 (stock), 140, 130, and finally 110. 110 is the sweet spot that stops the PC from crashing.
After update bios with EDC 110, my PC can now complete most CPU benchmark test such as Cinebench, OCCT and CPUZ. However, the OCCT CPU + Ram test will still crash the PC.
In search for a better solution
IMO this is a compromise, not an ideal solution as reducing EDC will reduce the CPU max clock speed; my python app is running ~10% slower
I'm new to this, if reducing EDC fixes the frequent crashing problem, does it mean that the underlying cause of those crashes were exceedingly high peak ("spike") current or voltage fluctuation triggered from the motherboard or CPU?
To AMD experts and users who fixed similar problems - Am i missing anything? is there a better solution to fix frequent black screen crashes?
------------------------
More details on my troubleshooting journey
Hardware troubleshooting
After trying various fixes and swapping my hardware, the problem still persist. Here's what i did -
Reformat and clean install of windows 11
Updated my BIOs and motherboard broke with persistent red light (not sure if this problem is related)
RMA and got a new motherboard with latest Bios
RMA and got a new CPU
Bought a new PSU, upgraded Silverstone 500W to Cosair SF750
Reseated my Ram, tested with single ram and double ram in all combinations
Tested with both onboard graphic and my external 1070 gpu
Tested both stock fan wraith prism fan and Noctua NH-L12S
I suspect ram incompatibility, but g.skill flare x5 is listed as a compatible ram on Asus B650e-i website. I prefer not to buy another pair of ram to test unless I'm certain this is the problem.
Software troubleshooting
Bios - Tried all AMD expo profiles. Doesn't work
Bios - Enable/disable Memory Context Restore and Power Down Enable settings. Doesn't work
Bios - Disable Power Supply Idle Control. Reduce crash frequency, but PC will still crash within 1-2 hours usage.
Bios - Disable Global c-state control. Reduce crash frequency, but PC will still crash within 1-2 hours usage.
Bios - Disable Precision Boost Overdrive
Windows - Installed the latest hardware drivers
Windows - Disabled all sleep options
Windows - Disabled onboard GPU in device driver
Windows - Did not install Asus Crate in windows (i read that crate may cause crashes).
All of the above software and hardware troubleshooting failed. My PC still crashes frequently.
Benchmark and tests
All benchmark test will crash at EDC = 150. After reducing EDC to 120, here are the test results -
Cinebench CPU multi-core - 1379 (vs 1632 by cpu-monkey)
Cinebench CPU single-core - 109 (vs 116 by cpu-monkey)
OCCT CPU stability test
OCCT CPU benchmark test - single sse: 196, multi sse: 1155, single avx: 207, multi avx: 2056.
OCCT memory benchmark test
CPUZ bench CPU - single 759, multi 11127 (vs single 780, multi 12106 by CPUZ 7900 benchmark)
The follow tests will still crash the PC
OCCT CPU + Ram test
Noctua NH-L12S: your cooling solution is inadequate, if I'm not mistaken, liquid cooling is even recommended, if you want to achieve maximum performance during use.
Another option is to manually reduce consumption with the options available in the BIOS. In my case I have a 7950x that is configured for PBO/ECO MODE 105 watts and a temperature of 89°C and TjMax of 74°C. These are parameters that I chose for my deepcool assassin III (it supports 160 continuous watts, not the 280 watts advertised on the box.) cooler to handle, but when it had 360mm liquid cooling there was no need for adjustment.
I don't think it has anything to do with cooling as my PC is crashing during light internet browsing or at the start of a benchmark test when CPU temp is below 60C.
I have to reduce EDC from stock 150 to 110-120 to stop the PC from crashing. Changing other PBO settings or TjMax doesn't help, it still crashes at stock EDC.
Anyway, i just upgraded my cooling to Arctic liquid freezer iii today. The AIO reduced my CPU temp from 95C at high load to <80C, very impressive temp improvement, but it still crashes when i reset my bios to default setting (EDC 150)
1 interesting observation
Using Noctua NH-L12S, the PC is stable at EDC - 120.
Using Arctic liquid freezer iii, the PC will crash at EDC 120; i have to reduce it further to 110
https://ncc.noctua.at/cpus/model/Ryzen-9-7900-1648 says that Noctua NH-L12S is "compatible without turbo/overclocking headroom".
You can try e.g. NH-D15S ("best turbo/overclocking headroom").
I just upgraded my cooling to Arctic liquid freezer iii today. The AIO reduced my CPU temp from 95C at high load to <80C, very impressive temp improvement, but it still crashes when i reset my bios to default setting (EDC 150)
1 interesting observation
Using Noctua NH-L12S, the PC is stable at EDC - 120.
Using Arctic liquid freezer iii, the PC will crash at EDC 120; i have to reduce it further to 110
@chunkiat You haven't mentioned temperatures and @mirao pointed out very well that the cooler you have is not completely up to par with that CPU, this according to Noctua itself.
When you reduce power to CPU, then it will work less, heat less and hence, no crashes.
I would monitor temperatures and report back.
If you want to fiddle a bit more, try reducing just a little little tad of power and use like -30 CO. It may compensate and get better performance than stock, however, it will still produce lots of heat and... may crash too.
Just to be clear, we are talking about a 7900 non-X 65Watt TDP part, right?
I'm also surprised that ROG STRIX B650E-I motherboard only has a humble 10+2 VRM design, thats not much but should be more than able for that CPU at least.
Do the troubleshooting fast because you may even have a faulty unit on your hands and may have to RMA. At this time we need to be open to lots of things.
Good Luck
Yes we are talking about 7900 non-x CPU. I used my PC for 4 months with no problem, and it suddenly started crashing frequently. I RMA-ed and received a new CPU, but the problem persists on the new CPU too
Quick update, i just upgraded my fan to an AIO, here are the temps.
Environment temp ~ 30°C
Noctua NH-L12S - EDC 120, curve -15
Idle: 47°C
CPUz bench CPU: 47.6°C
CPUz stress CPU: 95°C
Cinebench multicore: 95°C
Python multi-processing: 95°C
Arctic Freezer iii - EDC 110, curve -30
Idle: 44°C
CPUz bench CPU: 44.5°C
CPUz stress CPU: 79°C
Cinebench multicore: 75°C
Python multi-processing: 81°C
Even with Arctic freezer, my PC will crash at default bios setting.
When using Noctua NH-L12S, my PC is stable at EDC 120.
When using Arctic freezer, my pc is stable at EDC 110.
In both cases, my PC will crash at EDC >= 130.
Temps got really better with AIO but still crashing.
Do you got DOCP/EXPO enabled?
Remove Curve Optimizer totally, keep EDC at 120 or even 130 and disable EXPO/DOCP.
I understand that the memory kit is on the QVL but we need to take memory errors out of the equation. Since you already have a new CPU and does the same, we could have another culprit at hand and the board could be one.
Good Luck
I changed a new motherboard too, did not fix the problem.
Expo off, curve optimiser off, pc still crashes when edc is at default. Setting edc to 110 (with new aio) will keep the pc stable.
I did tons of memory tests (windows, mem86, occt), ram is working well with 0 error.
You mentioned b650e-i vrm. Since it crashes at default edc, it may have something to do with vrm/voltage related settings. Ill readup more about that thanks.
use economy mode instead of trying to configure manually, and very aggressive values of the curve optimizer can cause instability and, in addition, it may be that some core needs positive values to reach the maximum clock that you configured, this is because the more cores ( physical or logical), more energy is consumed and more heat is generated.
you can test by disabling SMT to reduce consumption and check the machine's behavior. Obviously performance will drop, but the clock can go up further. Note the temperatures and check whether the system is very close to the permitted temperature and consumption limit. If everything is at the limit, it means you have no margin with SMT enabled and need to adjust, for example, reducing the clock or limiting the maximum temperature.
Sometimes an installation error in the liquid (or air) cooler can affect the system, allowing the temperature to rise very quickly, so it is a good idea to check the behavior under less heavy loads, the clock rate and the voltages. Monitoring app suggestion: HWiNFO64