Losing my mind here, it's been a month of increasing blue screens. Only occurs when I play a graphically intensive game or do GPU training using PyTorch.
Windows 10, Ryzen 5600x, ASRock B550m-ITX/ac mobo, 3080 RTX. My ram is DDR4 3600 rated. I have the latest chipset drivers, BIOS firmware, etc. I've reset settings on BIOS multiple times now, there's no overclocking going on, I've tried XMP profile, non-XMP, i've tried 3200mhz for the memory and i've also tried slowing the timings a bit. Memtest86+ passes totally fine. I've tried Prime95 for ten minutes or so and couldn't get a crash. Ive tried single sticks of ram, the only weird thing I noticed is a single stick in the slot closest to the CPU makes it so the system won’t boot (won’t even post).
Things that crash my system: running PyTorch training (really inconsistent though, sometimes I can train for hours and it's fine, other times it crashes within 10 minutes), playing Starfield for a few minutes.
Because of my crash cases, I feel like it may be GPU related, but my dump files all talk about AMD issues. (I’ll have to post dump files tomorrow, pastebin links won’t work for some reason, but dump files always mention authenticAMD or something)
Any help would be massively appreciated, I really don't know what to do now aside from starting to replace components, and even then I don't know if i start with my GPU or CPU or memory.