cancel
Showing results for 
Search instead for 
Did you mean: 

PC Processors

chunkiat
Journeyman III

Ryzen 7900 frequent black screen crash follow by 100% fan speed

I setup a new PC earlier this year in Feb. I primarily use the PC for coding and casual internet browsing. I bought a Ryzen 7900 so i can run my python script with multi-processing.

Problem: After using the PC for 4 months with no issues, my PC suddenly started crashing frequently - sudden black screen crash follow by CPU fan running at 100%.

  1. It crashes after logo and gets into a bootloop. I can only get into windows after windows recovery is loaded.

  2. It crashes during light browsing after i got into windows.

  3. It crashes immediately at the start of any CPU test (CPUz, occt, etc.)

Ironically, when i run my python script at ~70% high CPU load, the PC will NOT crash for the whole day. But it will crash soon after i stop the script.

I'm using default BIOs setting with the following hardware setup -

  • AMD Ryzen 9 7900

  • Asus ROG STRIX B650E-I motherboard

  • G.Skill Flare X5 Expo 2x16GB DDR5 6000 CL36 (F5-6000J3636F16GX2-FX5)

  • Palit GTX 1070 JetStream

  • Cosair SF750

  • Arctic Liquid Freezer iii

  • XTIA Xproto-N open casing

Temporary Solution

After testing various underclocking configuration, i managed to stabilize my PC and stop the frequent crashes with the following bios configuration -

  • Precision Boost Overdrive: Manual

  • EDC limit: 110

  • Curve optimizer: negative 30

Reducing EDC from from 150 (Ryzen 7900 stock) to 110 managed to stop the frequent crashes. I tested different EDC limits from 200, 150 (stock), 140, 130, and finally 110. 110 is the sweet spot that stops the PC from crashing.

After update bios with EDC 110, my PC can now complete most CPU benchmark test such as Cinebench, OCCT and CPUZ. However, the OCCT CPU + Ram test will still crash the PC. 

 

In search for a better solution

IMO this is a compromise, not an ideal solution as reducing EDC will reduce the CPU max clock speed; my python app is running ~10% slower

I'm new to this, if reducing EDC fixes the frequent crashing problem, does it mean that the underlying cause of those crashes were exceedingly high peak ("spike") current or voltage fluctuation  triggered from the motherboard or CPU?

To AMD experts and users who fixed similar problems - Am i missing anything? is there a better solution to fix frequent black screen crashes?

 

------------------------

More details on my troubleshooting journey

Hardware troubleshooting

After trying various fixes and swapping my hardware, the problem still persist. Here's what i did -

  1. Reformat and clean install of windows 11

  2. Updated my BIOs and motherboard broke with persistent red light (not sure if this problem is related)

  3. RMA and got a new motherboard with latest Bios

  4. RMA and got a new CPU

  5. Bought a new PSU, upgraded Silverstone 500W to Cosair SF750

  6. Reseated my Ram, tested with single ram and double ram in all combinations

  7. Tested with both onboard graphic and my external 1070 gpu

  8. Tested both stock fan wraith prism fan and Noctua NH-L12S

I suspect ram incompatibility, but g.skill flare x5 is listed as a compatible ram on Asus B650e-i website. I prefer not to buy another pair of ram to test unless I'm certain this is the problem.

Software troubleshooting

  1. Bios - Tried all AMD expo profiles. Doesn't work

  2. Bios - Enable/disable Memory Context Restore and Power Down Enable settings. Doesn't work

  3. Bios - Disable Power Supply Idle Control. Reduce crash frequency, but PC will still crash within 1-2 hours usage.

  4. Bios - Disable Global c-state control. Reduce crash frequency, but PC will still crash within 1-2 hours usage.

  5. Bios - Disable Precision Boost Overdrive

  6. Windows - Installed the latest hardware drivers

  7. Windows - Disabled all sleep options

  8. Windows - Disabled onboard GPU in device driver

  9. Windows - Did not install Asus Crate in windows (i read that crate may cause crashes).

All of the above software and hardware troubleshooting failed. My PC still crashes frequently.

Benchmark and tests

All benchmark test will crash at EDC = 150. After reducing EDC to 120, here are the test results -

  • Cinebench CPU multi-core - 1379 (vs 1632 by cpu-monkey)

  • Cinebench CPU single-core - 109 (vs 116 by cpu-monkey)

  • OCCT CPU stability test

  • OCCT CPU benchmark test - single sse: 196, multi sse: 1155, single avx: 207, multi avx: 2056.

  • OCCT memory benchmark test

  • CPUZ bench CPU - single 759, multi 11127 (vs single 780, multi 12106 by CPUZ 7900 benchmark)

The follow tests will still crash the PC

  • OCCT CPU + Ram test

0 Likes
9 Replies
roupa_de_trapo
Adept III

Noctua NH-L12S: your cooling solution is inadequate, if I'm not mistaken, liquid cooling is even recommended, if you want to achieve maximum performance during use.

 

Another option is to manually reduce consumption with the options available in the BIOS. In my case I have a 7950x that is configured for PBO/ECO MODE 105 watts and a temperature of 89°C and TjMax of 74°C. These are parameters that I chose for my deepcool assassin III (it supports 160 continuous watts, not the 280 watts advertised on the box.) cooler to handle, but when it had 360mm liquid cooling there was no need for adjustment.

0 Likes

I don't think it has anything to do with cooling as my PC is crashing during light internet browsing or at the start of a benchmark test when CPU temp is below 60C.

 

I have to reduce EDC from stock 150 to 110-120 to stop the PC from crashing. Changing other PBO settings or TjMax doesn't help, it still crashes at stock EDC. 

Anyway, i just upgraded my cooling to Arctic liquid freezer iii today. The AIO reduced my CPU temp from 95C at high load to <80C, very impressive temp improvement, but it still crashes when i reset my bios to default setting (EDC 150)

 

1 interesting observation

Using Noctua NH-L12S, the PC is stable at EDC - 120.
Using Arctic liquid freezer iii, the PC will crash at EDC 120; i have to reduce it further to 110

0 Likes
mirao
Journeyman III

https://ncc.noctua.at/cpus/model/Ryzen-9-7900-1648 says that Noctua NH-L12S is "compatible without turbo/overclocking headroom".

You can try e.g. NH-D15S ("best turbo/overclocking headroom").

I just upgraded my cooling to Arctic liquid freezer iii today. The AIO reduced my CPU temp from 95C at high load to <80C, very impressive temp improvement, but it still crashes when i reset my bios to default setting (EDC 150)

 

1 interesting observation

Using Noctua NH-L12S, the PC is stable at EDC - 120.
Using Arctic liquid freezer iii, the PC will crash at EDC 120; i have to reduce it further to 110

0 Likes
johnnyenglish
Big Boss

@chunkiat You haven't mentioned temperatures and @mirao pointed out very well that the cooler you have is not completely up to par with that CPU, this according to Noctua itself.

When you reduce power to CPU, then it will work less, heat less and hence, no crashes.

I would monitor temperatures and report back.


If you want to fiddle a bit more, try reducing just a little little tad of power and use like -30 CO. It may compensate and get better performance than stock, however, it will still produce lots of heat and... may crash too.


Just to be clear, we are talking about a 7900 non-X 65Watt TDP part, right?

I'm also surprised that ROG STRIX B650E-I motherboard only has a humble 10+2 VRM design, thats not much but should be more than able for that CPU at least.

Do the troubleshooting fast because you may even have a faulty unit on your hands and may have to RMA. At this time we need to be open to lots of things.


Good Luck

The Englishman

Yes we are talking about 7900 non-x CPU. I used my PC for 4 months with no problem, and it suddenly started crashing frequently. I RMA-ed and received a new CPU, but the problem persists on the new CPU too

 

Quick update, i just upgraded my fan to an AIO, here are the temps. 

 

Environment temp ~ 30°C

Noctua NH-L12S - EDC 120, curve -15
Idle: 47°C
CPUz bench CPU: 47.6°C
CPUz stress CPU: 95°C
Cinebench multicore: 95°C
Python multi-processing: 95°C

 

Arctic Freezer iii - EDC 110, curve -30
Idle: 44°C
CPUz bench CPU: 44.5°C
CPUz stress CPU: 79°C
Cinebench multicore: 75°C
Python multi-processing: 81°C

 

Even with Arctic freezer, my PC will crash at default bios setting.

When using Noctua NH-L12S, my PC is stable at EDC 120.
When using Arctic freezer, my pc is stable at EDC 110.
In both cases, my PC will crash at EDC >= 130.

 

0 Likes

Temps got really better with AIO but still crashing.

 

Do you got DOCP/EXPO enabled? 

Remove Curve Optimizer totally, keep EDC at 120 or even 130 and disable EXPO/DOCP.

I understand that the memory kit is on the QVL but we need to take memory errors out of the equation. Since you already have a new CPU and does the same, we could have another culprit at hand and the board could be one.


Good Luck




The Englishman
0 Likes

I changed a new motherboard too, did not fix the problem. 

 

Expo off, curve optimiser off, pc still crashes when edc is at default. Setting edc to 110 (with new aio) will keep the pc stable.

 

I did tons of memory tests (windows, mem86, occt), ram is working well with 0 error.

 

You mentioned b650e-i vrm. Since it crashes at default edc, it may have something to do with vrm/voltage related settings. Ill readup more about that thanks.

0 Likes

use economy mode instead of trying to configure manually, and very aggressive values ​​of the curve optimizer can cause instability and, in addition, it may be that some core needs positive values ​​to reach the maximum clock that you configured, this is because the more cores ( physical or logical), more energy is consumed and more heat is generated.

 

you can test by disabling SMT to reduce consumption and check the machine's behavior. Obviously performance will drop, but the clock can go up further. Note the temperatures and check whether the system is very close to the permitted temperature and consumption limit. If everything is at the limit, it means you have no margin with SMT enabled and need to adjust, for example, reducing the clock or limiting the maximum temperature.

 

Sometimes an installation error in the liquid (or air) cooler can affect the system, allowing the temperature to rise very quickly, so it is a good idea to check the behavior under less heavy loads, the clock rate and the voltages. Monitoring app suggestion: HWiNFO64

0 Likes