Hi
I have the following setup:
CPU: Ryzen 3700x
GPU: Gigabyte 5700XT
MB: Gigabyte Aorus Elite
RAM: Crucial Balistix 2x16GB (2666Mhz base)
PSU: Seasonic Focus GX 80+ gold 850w
SSD: ADATA XPG SX8200 PRO 512GB
TL;DR
Pc randomly restarts (screen goes black, the sound become distorted and the RGB stays on). Tried almost everything.
More details:
I have tried stress testing every component and couldn't reproduce the issue. The temps were fine.
I have bought another PSU ( had the same one but 550W ) but no change.
I have run memtest86+ and identified that one stick was having issues (10k+ errors) .So I bought new RAM (that is on the QVL list) to replace the old one (Corsair). The issue is still present with or without XMP on. I have also run memtest86+ on the new ram and it didn't find any issues.
Reinstalled windows a couple times.
Disabled Core Boost from BIOS.
Underclocked the GPU.
Updated BIOS to the latest version.
There used to be only Error 41 on Windows Event Viewer, but now there are more errors.
I tried to analyze them with WinDbg but couldn't figure out much. I think that one component it's having hardware issues but I can't figure out which one. I really hope it's not the GPU because the rest of them are fairly priced.
Every component was bought new. The issue was always there, it did not appear later.
The errors I found are:
FAILURE_BUCKET_ID: LKD_0x141_IMAGE_amdkmdag.sys
VIDEO_ENGINE_TIMEOUT_DETECTED (141)
whea_uncorrectable_error (124)
I have uploaded the dump files here https://drive.google.com/drive/folders/1VGpnwuz1_P6JG2p36baSymgA6Xf0Txfc?usp=sharing
Does anyone has any idea on what the issue could be? I would really appreciate the help.
I also have a Ryzen 7 3700X processor with 32 Ram installed on a Asus X570 Motherboard.
You didn't mention which Stress testing software you used.
But if you didn't use OCCT try using that to stress test your CPU, GPU and PSU. See if it fails or shuts down during the testing.
Keep an eye on PSU Outputs and temperatures and Fan speeds while testing.
OCCT GPU has test for the GPU and GPU vRAM besides the PSU and CPU.
Also if you have another GPU you can temporarily install to see if the problems keeps happening or not. Sometimes a defective GPU will cause the computer to restart.
Normally the main reasons a computer shuts down is due to Overheating, Overclocking, Defective hardware, Driver.
I looked at all of your Watchdog and WHEA dumps and found the following errors:
WATCHDOG DUMPS:
Windows Bug Check 0x117 - AMDKMDAG.sys (2 Dumps)
Windows Bug Check 0x141 -AMDKMDAG.sys (2 Dumps)
Bug Check 0x117 is "VIDEO_TDR_TIMEOUT_DETECTED
Bug Check 0x141 is "VIDEO_ENGINE_TIMEOUT_DETECTED
WHEA DUMPS:
Windows Bug Check 0x124 - NTOSKRNL.exe (4 Dumps)
Bug Check 0x124 is "WHEA_UNCORRECTABLE ERRO" ( hardware failure.
This is Microsoft explanation for Bug Check 0x124: https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x124---whea-uncorrecta...
This bug check is typically related to physical hardware failures. It can be heat related, defective hardware, memory or even a processor that is beginning to fail or has failed. If over-clocking has been enabled, try disabling it. Confirm that any cooling systems such as fans are functional. Run system diagnostics to confirm that the system memory is not defective. It is less likely, but possible that a driver is causing the hardware to fail with this bug check.
For additional general bug check troubleshooting information, see Blue Screen Data.
Run OCCT CPU,GPU and PSU tests to see if any fails. While running the tests keep an close eye at Temperatures, Fan speeds and PSU Outputs.
Sorry for the late reply. I was away from my pc.
Thanks for all to information you provided and your time.
I run all of the tests from OCCT and I couldn't replicate the issue.
I run the CPU one first and the max temps were in the low 40s.
GPU 3D: max temp was 77. Hot spot max temperature was 107
GPU VRAM: max temp was 77. Hot spot max temperature was 109
POWER: Same temps for GPU. The CPU went up to low 60s
The Voltages seemed stable
Here are some screenshots. Maybe I am missing something. I am not well documented about all the information provided.
Broken Drivers nothing new for RX5700XT use ddu and find good drivers and block windows update try use 2020-21.8.2 drivers
I built this PC 2 years ago and I had the issue since then. So I am not so sure it would be the driver because I used multiple versions. But I will give that a try as well. Thanks
Your temperature for your Processor are extremely good under stress and your PSU outputs are very good also.
You mentioned the GPU Hot spots were:
GPU 3D: max temp was 77. Hot spot max temperature was 107
GPU VRAM: max temp was 77. Hot spot max temperature was 109
The Maximum Operating Temperature of the Radeon 5000/6000 Series GPU cards is 110C for the Hot Spots. So your RX5700X was either starting to throttle or was about too during the GPU vRAM and 3D tests.
I would check your GPU Hot Spot temperatures while playing a game and see if it get hotter than 110C when the screen goes black. You will need to have a monitoring software running all the time while playing games.
It is possible your vRAM is overheating past 110C causing your black screens. Just my opinion.
Could also be a driver issue as mentioned previously. If this is so here you can download about 6 months of previous drivers for your GPU Card: https://www.amd.com/en/support/previous-drivers/graphics/amd-radeon-5700-series/amd-radeon-rx-5700-s...
The latest version for your GPU Card: https://www.amd.com/en/support/graphics/amd-radeon-5700-series/amd-radeon-rx-5700-series/amd-radeon-...
To AMD Moderator:
I received a HMTL Error because I used the key Shift symbol for in my last reply. After deleting the symbol it went through.
The message body contains " ", which is not permitted in this community. Please remove this content before sending your post.
@Sam_AMD I can't add your tag to my previous reply where I tagged AMD MATT. Keep getting the HTML Error.
It happened to get black screen just after entering Windows and opening Chrome. I find it weird to get that hot that quick.
I will try to monitor the temps during a gaming session and see what I get.
I will also try different drivers.
Since the CPU can be ruled out. Do you think the motherboard could be malfunctioning?
Thank you for your feedback.
Someone else will need to answer your question. I do know there are many threads at AMD Forums concerning Black screens and crashing with the 5000 & 6000 Series GPU Cards and 5000 series APU.
Best way to check is by installing a different GPU card or installing your AMD GPU card in another computer and see if the same thing occurs or not.
In case previous AMD drivers doesn't fix your problem.
Plus you can always open a Support Ticket for your motherboard just to see what they say. Personally I don't believe it is a defective Motherboard issue. Possibly a BIOS issue but I could be wrong.
Also open a AMD Support ticket and ask them concerning your AMD GPU Card from here: https://www.amd.com/en/support/contact-email-form
Thanks a lot!