cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

carlwillis
Journeyman III

BSoDs and graphics resets, Ryzen 9 3900x + Radeon R9 390X, Win10

Hello all,

 

I have a new Ryzen 9 3900X desktop build that has been extremely unstable, exhibiting numerous crashes, BSoDs, and subsequent refusals to boot Windows. Info and minidumps are below.

I'd be very interested if other people using similar hardware have had similar problems, or if any super-sleuths can look at the attachments and point me in the direction of a fix.  The oddest thing about my problem seems to be its correlation with lightly-loaded operation; the system runs seemingly quite well under moderate to heavy CPU / GPU loads or CPU alone, but is really unhappy with unloaded operation.

  • The PC will run fine IF the CPU is heavily loaded.  My usual workload on this machine in its short and tumultuous life has involved parallel radiation transport simulations that run for a week or so and keep the processor running at ~90% load with 24 threads.  It just runs and the CPU temps hover near 73 deg. C.  HOWEVER: if the machine is unloaded, it is highly liable to throw display driver resets and BSoDs.
  • This build really seems to hate AMD graphics cards, throwing regular GPU driver restart errors with both the R9 390X (a fairly old card) and an RX 5700 (new) using the most recent graphics drivers from AMD.  A frequent occurrence are LiveKernelEvent errors referencing LKD_0x141_Tdr:6_IMAGE_amdkmdag.sys and LKD_0x117_Tdr:9_IMAGE_atikmpag.sys.
  • Runs Furmark without problems so far.  Again, the machine is most error-prone when it is just idling!
  • Currently running with the memory D.O.C.P. profile enabled, but it's hard to tell if this makes any difference in the bad behavior (have tried with this on and off).
  • Memtest runs 4+ passes with 0 errors on the currently installed DIMMs
  • I have swapped the OS SSD with another of the same model and this did not fix the issue
  • No history of overclocking (with exception of D.O.C.P.)

System information (more is available in attached screenshot from HWiNFO):

  • Win10 Pro 64bit version 2004 / OEM
  • All-new hardware (May 2020)
  • Win10 has been reinstalled multiple times after file corruption and boot failures
  • CPU: AMD Ryzen 9 3900X; most recent AMD chipset drivers are installed.
  • GPU: I have tried two AMD video cards with similar instabilities, current installed is Radeon R9 390X, have also used Radeon RX 5700; drivers are Radeon Adrenalin 20.5.1
  • Motherboard: Asus TUF Gaming X570-Plus
  • BIOS: 1407 (shipped with 1405, I flashed 1407)
  • Memory: G.Skill Trident Z Neo Series 32GB (2 x 16GB) 288-Pin SDRAM PC4-28800 DDR4 3600MHz CL16-19-19-39 1.35V Desktop Memory Model F4-3600C16D-32GTZNC
  • Primary storage (with OS): 1TB SSD, WD Black SN750 NVMe
  • Secondary storage: Seagate IronWolf 8 TB
  • Power Supply: Corsair RM850X

Win10 minidumps:

061520-6250-01.dmp - Google Drive 

061620-5875-01.dmp - Google Drive 

Thanks for any assistance!

-Carl

1 Solution
carlwillis
Journeyman III

Update and resolution, 7/6/2020

I ordered a brand-new Ryzen R9 3900X processor and swapped for the existing one in the unstable build described above, resulting instantly in stable operation.  So the problem was the original Ryzen R9 3900X processor.  I applied for warranty coverage from AMD.  Now comes the fun part, seeing how long it takes them to respond and what kind of hoops they want me to jump through to replace that chip.

Obviously, as a troubleshooting strategy, CPU replacement is a costly approach that typically only gets tried when everything else has failed (and that's what happened for me).  I'm not used to defective CPUs, and my hardware errors were never specific enough to unambiguously determine that the CPU was at issue.  Consequently, I have enough new hardware (graphics cards, NVMe storage, RAM, etc.) sitting around now to build a second PC!


Thanks to those who offered advice, and my hope is that others with similar difficulties see this and don't rule out defective CPUs in your troubleshooting workflow.

-Carl

View solution in original post

0 Likes
6 Replies
redikarus
Journeyman III

Hello, if you check my request, I have exactly the same problem, but for a 2600X processor. It works perfectly fine with high load, however when starting a computer cold after a couple of minutes when I just browse the internet it tends to restart. After that it will work perfectly, especially when having a high load.

0 Likes
doomcreeper
Adept III

I also get BSOD with the participation of the same LKD_0x141_Tdr:6_IMAGE_amdkmdag.syss on the rx vega 64 video card. W10 \ x64 \ 2004 \ adrenalin 20.5.1

0 Likes
fyrel
Miniboss

Not an expert with crash dumps.

The second crash dump has the following error.

WHEA_UNCORRECTABLE_ERROR (124)

Process that caused it was chrome.exe.

And the hardware module that shut down was your CPU.

And here is some advice I shamelessly copied off the internet about troubleshooting the issue.

Stop 0x124 is a hardware error
If you are overclocking try resetting your processor to standard settings and see if that helps.
If you continue to get BSODs here are some more things you may want to consider.

This is usually heat related, defective hardware, memory or even processor though it is"possible" that it is driver related (rare).

Stop 0x124 - what it means and what to try

Synopsis:
A "stop 0x124" is fundamentally different to many other types of bluescreens because it stems from a hardware complaint.
Stop 0x124 minidumps contain very little practical information, and it is therefore necessary to approach the problem as a case of hardware in an unknown state of distress.

 Generic "Stop 0x124" Troubleshooting Strategy:

1) Ensure that none of the hardware components are overclocked. Hardware that is driven beyond its design specifications - by overclocking - can malfunction in unpredictable ways.
2) Ensure that the machine is adequately cooled.
 If there is any doubt, open up the side of the PC case (be mindful of any relevant warranty conditions!) and point a mains fan squarely at the motherboard. That will rule out most (lack of) cooling issues.
3) Update all hardware-related drivers: video, sound, RAID (if any), NIC... anything that interacts with a piece of hardware.
It is good practice to run the latest drivers anyway.
4) Update the motherboard BIOS according to the manufacturer's instructions.
Their website should provide detailed instructions as to the brand and model-specific procedure.
5) Rarely, bugs in the OS may cause "false positive" 0x124 events where the hardware wasn't complaining but Windows thought otherwise (because of the bug).
At the time of writing, Windows 10 is not known to suffer from any such defects, but it is nevertheless important to always keep Windows itself updated.
6) Attempt to (stress) test those hardware components which can be put through their paces artificially.
The most obvious examples are the RAM and HDD(s).
For the RAM, use the in-built memory diagnostics (run MDSCHED) or the 3rd-party memtest86 utility to run many hours worth of testing.
For hard drives, check whether CHKDSK /R finds any problems on the drive(s), notably "bad sectors".
Unreliable RAM, in particular, is deadly as far as software is concerned, and anything other than a 100% clear memory test result is cause for concern. Unfortunately, even a 100% clear result from the diagnostics utilities does not guarantee that the RAM is free from defects - only that none were encountered during the test passes.
7) As the last of the non-invasive troubleshooting steps, perform a "vanilla" reinstallation of Windows: just the OS itself without any additional applications, games, utilities, updates, or new drivers - NOTHING AT ALL that is not sourced from the Windows 7 disc.
Should that fail to mitigate the 0x124 problem, jump to the next steps.
If you run the "vanilla" installation long enough to convince yourself that not a single 0x124 crash has occurred, start installing updates and applications slowly, always pausing between successive additions long enough to get a feel for whether the machine is still free from 0x124 crashes.
Should the crashing resume, obviously the very last software addition(s) may be somehow linked to the root cause.
If stop 0x124 errors persist despite the steps above, and the hardware is under warranty, consider returning it and requesting a replacement which does not suffer periodic MCE events.
Be aware that attempting the subsequent hardware troubleshooting steps may, in some cases, void your warranty:
8) Clean and carefully remove any dust from the inside of the machine.
Reseat all connectors and memory modules.
Use a can of compressed air to clean out the RAM DIMM sockets as much as possible.
9) If all else fails, start removing items of hardware one-by-one in the hope that the culprit is something non-essential which can be removed.
Obviously, this type of testing is a lot easier if you've got access to equivalent components in order to perform swaps.

Should you find yourself in the situation of having performed all of the steps above without a resolution of the symptom, unfortunately the most likely reason is because the error message is literally correct - something is fundamentally wrong with the machine's hardware.

0 Likes
carlwillis
Journeyman III

Update and resolution, 7/6/2020

I ordered a brand-new Ryzen R9 3900X processor and swapped for the existing one in the unstable build described above, resulting instantly in stable operation.  So the problem was the original Ryzen R9 3900X processor.  I applied for warranty coverage from AMD.  Now comes the fun part, seeing how long it takes them to respond and what kind of hoops they want me to jump through to replace that chip.

Obviously, as a troubleshooting strategy, CPU replacement is a costly approach that typically only gets tried when everything else has failed (and that's what happened for me).  I'm not used to defective CPUs, and my hardware errors were never specific enough to unambiguously determine that the CPU was at issue.  Consequently, I have enough new hardware (graphics cards, NVMe storage, RAM, etc.) sitting around now to build a second PC!


Thanks to those who offered advice, and my hope is that others with similar difficulties see this and don't rule out defective CPUs in your troubleshooting workflow.

-Carl

0 Likes
dosrolux
Journeyman III

I Have same Issue.

I have abrand new system Asus B550 Tuf gaming + Ryzen 3900XT + watercooling + 32gb ram.Corsair 3600 + Radeon 5700 8gb + 850w PSU + Corsair MP600 1TB + Samsung Evo plus 970 1TB

I use no OC, Bios settings are default.

I updated the bios to the last version.

My firste installation was well, but not for long.

My temps was between 34-42°. I notice when the system is on heavy load it hangs after a few seconds.

The system is very instable

I try to reinstall windows 10 2004, i get everytime BSOD now.

WHEA_UNCORRECTABLE_ERROR.

I try a old windows 10 1909 same problems

It's realy strange, why did it work the first time ?

Maybe my cheap PSU is not doing the work well maybe.  (a new is on the way 850w Corsair  80+ Gold)

I don't know where to start to search.

I will try tomorrow with other memory and i see if i can have it run stable

Anyone any ideas where to start?

0 Likes
dosrolux
Journeyman III

Found the problem.

My system had 4 x 8 gb ram ddr4 3600 Corsair

Just removed 2 sticks of ram and it worked and stable.

Have done stress test, Cinebench etc...... it passes them all without any freezing or BSOD.

I also have done fine tuning on the bios for the ram speed and stressed again and it works perfectly with 2 Ram sticks occupied

Now i have to check if it is a Mainboard issue or Ram issue

0 Likes