cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

alex_london
Adept I

Ryzen 9 5950X - System randomly unresponsive/crashing (hard reset needed) - Faulty CPU?

Specs:

CPU: AMD Ryzen 9 5950X Retail
Motherboard (current): ASRock X570 Pro4 (latest BIOS L5.01)
Motherboard (prior): Asus ROG STRIX X570-F Gaming (latest BIOS)

RAM:
2x Corsair Vengeance LPX Black DDR4-RAM 3600 MHz 16 GB - CMK32GX4M2D3600C18
2x Corsair Vengeance RGB PRO 16GB DDR4 3600 MHz C18 - CMW32GX4M2Z3600C18

GPU: PNY NVIDIA Quadro P400

SSD:
1x Samsung 980 Pro NVMe M.2
2x Samsung 980 1TB Gen3 NVMe M.2 (1 using SupaGeek M.2 NVMe SSD to PCIe x4 Adapter Card)
PSU: Corsair TX-M Series 650W
Cooler: Noctua NH-D15 SE-AM4

OS: Windows 10 Pro

I built the above PC back in June 2021, with the Asus ROG STRIX X570-F motherboard, and system has been stable until recently. This acts as a media server (Plex etc) and is running 24/7. A few weeks ago I had a first occurrence of a system hang - no BSOD, totally unresponsive, nothing in Windows 10 event logs prior, only way to bring it back was a hard reset or power cycle. Since then, this has happened more frequently, to a point where it's now usually a few hours between this happening. That said, it's quite random... it may be fine for 2-3 days or more, then the problem might happen 2-3 times in as many hours!

So far I have tried (crashes happened with every occasion):

  • Running with 32 GB of RAM (out of 64 GB), then trying the other pair
  • Running without GPU (headless - could still remote into it and ping while it was up)
  • Removing all SSDs except system drive
  • Replaced system drive SSD with new one (980 Pro)
  • Removed all SSDs, booted off Western Digital 6TB SATA HDD
  • Replaced ASUS motherboard with ASRock X570 Pro4
  • Running other OS's (tried Ubuntu Linux 22.02 LTS, Proxmox, clean Windows 10 install)

No overclocking, all BIOS settings are mostly out-of-the-box (I enabled virtualization to support Hyper-V, tweaked WoL and set fan speeds to max, that's about it). The problem is unlikely to be cooling, Prime95 or other CPU intensive tasks like transcoding video can run for multiple hours with CPU package not going much above 70 °C, idle temps being around 48 °C.

Unfortunately the unpredictability of what and when crashes are triggered makes it hard to verify my changes worked (until I wait long enough, and even that isn't a guarantee as I recently had a "stable window" of 5-6 days without crashing).

A handful of times, I did see a BSOD, twice with a DPC_WATCHDOG_VIOLATION error, and once with a CLOCK_WATCHDOG_TIMEOUT error. But these cases, as before, there was nothing in the event log to indicate this even happened, and no memory dump file was created either (the BSOD was stuck at 0% for multiple hours in all cases after which I gave up and power cycled it).

I'm now down to either the PSU (unlikely) or CPU being at fault (or the case! ), I think the CPU being the most likely problem here.

Is there anything else you guys can suggest I try or look into before replacing the CPU?

Any responses would be appreciated!

0 Likes
1 Solution
alex_london
Adept I

Just to close this out - replaced the CPU with a newly purchased 5950X last weekend (and swapped back to Asus motherboard), system has been stable again (knock on wood!). 

The faulty one is still under warranty, should hopefully get that replaced and will have (almost) enough parts to build a 2nd system now!

View solution in original post

0 Likes
9 Replies

Mixed ram kits, and asrock only list a few compatible corsair 4 stick kits (tested)?, could be the problem.

Ryzen 5 5600x, B550 aorus pro ac, Hyper 212 black, 2 x 16gb F4-3600c16dgtzn kit, NM790 2TB, Nitro+RX6900XT, RM850, Win.10 Pro., LC27G55T..
0 Likes

The same issue was happening with the ASUS motherboard first, I only changed to ASRock to rule out the board being the issue. Prior to motherboard swap I had tried running the system on only 2 of the 4 RAM modules, same. And the system was stable for 2 years before this started happening (and had the mixed RAM kit since the beginning).

BTW, it's not really "mixed". They're the same modules, one pair has RGB LEDs the other doesn't.

0 Likes
bestchriss
Adept I

The Ryzen 9 5950X processor supports memory as Max 3200MHz, the months are at 3600MHz, it is obvious that you have a processor and memory clock synchronization problem. For this you will manually set it to 3200MHz through bios to have stability in your system.

https://www.cpu-world.com/CPUs/Zen/AMD-Ryzen%209%205950X.html

 

0 Likes

@bestchriss- thanks, though not sure I understand this correctly, and it's not that obvious (to me)!

AFAIK, a RAM's module rating is the maximum it can handle, not what it will actually run at, right? The motherboard has already determined to clock at the lowest compatible rate, which shows up as DDR4-2133 in BIOS for all 4 modules (I think Asus mobo was higher, this is ASRock which I'm currently using).

Also, please note... this system was stable with no hardware changes for 2+ years! Issues only started happening recently. If it was a configuration issue (whether memory clock synchronization, incompatible RAM / SSD / etc), I expect I would have had problems from day 1.

EDIT: Typo in DDR4-2133

0 Likes

Logically since they run at 2133 , meant that you have SPD Latency: 15, if they were running at 3600 you would have SPD Latency: 18, the correct mode for this processor for Max capabilities is at 3200MHZ. You should have put 2x32GB and not 4x16 memories, at 3200MHz which your processor correctly supports. Now with memory running at 2133 MHz, you clearly have lower performance. I have the 5600G and I manually changed the memory speed via bios to 3200MHz on an asus and gigabyte motherboard, I never buy another brand of motherboards. If you want detail, put AIDA64. You can also try one pair of 2X16 MHz memories to see if the motherboard will automatically change the frequency of the memories, try placing the memories in different positions to see if the frequency changes, however, the way you tell me the memories are not working correctly.

3200 MHz is normal, from there on it is overclocking.

look the Specification in motherboard memory

https://www.asrock.com/mb/AMD/X570%20Pro4/index.asp#Specification

and here are the memories supported by the motherboard

https://www.asrock.com/mb/AMD/X570%20Pro4/index.asp#MemoryCEZ

 

0 Likes

@bestchriss- thanks, though I will look into this later.

Performance is not my problem/priority here. To be honest I don't care if it's 2x slower, it's well over-provisioned for the task at hand, this is not supposed to be some high-end gaming rig. But this is all a bit off-topic...

This does not explain how the 4x16 GB modules were working perfectly fine for 2+ years and only recently decided to cause issues?

Also I noted that I did reproduce the crashing with just 2x16 (matching) modules, first using the one set, then again using the other (this was on the Asus ROG STRIX X570-F motherboard, I didn't do the same tests after replacing it with ASRock).

So crashing cannot be blamed on the RAM configuration, despite it (maybe) causing the system to run less optimally than it could.

0 Likes
alex_london
Adept I

Just to close this out - replaced the CPU with a newly purchased 5950X last weekend (and swapped back to Asus motherboard), system has been stable again (knock on wood!). 

The faulty one is still under warranty, should hopefully get that replaced and will have (almost) enough parts to build a 2nd system now!

0 Likes
Anonymous
Not applicable

Your CPU maybe newer as its got the 50 on the end its maybe released LATER and REQUIRES A BIOS UPDATE.

you should also not enable XMP but look at its VOLTAGE REQUIREMENTS and set those manually while leaving the RAM to AUTO and lowering latency and NOT OVERCLOCKING then setting it to lower so say CAS18 if you halve that its 9 right? so at 1866mhz OR 2133mhz you should be aiming for CAS9 but probably stuck with 10 or 11 if its not samsung Bdie as those fake not computer **bleep**s intel and nvidias fake memory profiles are literally ways of crippling AMD and youre going to have a hard time. Even with a more expensive model or gskill/tridentZ or some of the lower latency first gen DDR 4 stuff dirt cheap or flareX or similar you can do a bit better but not great. You should try and set the secondary and primary RAM TIMINGS MANUALLY maybe refer to database in TYPHOON BURNER and enable fast write mode in there THEN SET ALL MEMORY RELATED THINGS INCLUDING NBIO AND DFU or whatever in bios subsections for interleaving or ECC or parity or error injection or memory poisoning to ALWAYS AUTO or if not possible enabled including timings once you had them 'more correct' try back to auto after a boot and some test gaming.  Sometimes each reboot you must set RAM bios options to auto ALL OF THEM again. 

You should leave your INFINITY FABRIC CLOCK divider to be HALF or 1/4 precisely and then leave it to auto. 

maybe get rid of the fake not a computer nvidia trash and swap it out for the cheapest AMD anything the last few years and be trillions of times better mathematically. They could have hundreds of intel nvidia not computers rendering a houdini graphics simulation for a movie like spiderman venom whatever for a minute or two of footage like a music video long for hours days or weeks and i can pull out my AMD whatever chip mobile phone and configure it for TRUE VIDEO and enhance the video with my own in game 3d software settings for the display driver for lighting and shadows and 4D/3D and texture and bumpiness or grainy or how metallic and the render resolution and set it to ULTRA EXTREME QUALITY and ENHANCE it and play it back in HIGHER QUALITY IN REAL TIME AS I VIEW IT OR IF I USE SOME apps/code GAME INSIDE OF IT.

AMDHARDWARE.rar ~ pixeldrain

0 Likes
jamiec1
Journeyman III

I had exactly the same issue and managed to fix it by fixing the voltage on the cpu rather than leaving it on auto. 

I have had the chip for 2 years and it suddenly started freezing when idle or not doing a lot. This got worse and worse. Fixing the voltage at 1.3 or 1.4 fixed it and it is now stable with no further problems as far as i can see. It seem like quite a few people have this issue. Hope this helps.  

0 Likes