Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

Question asked by riveryeti on Feb 4, 2020
ASRock TRX40 Creator | AMD TR 3960x | CORSAIR Vengeance LPX 32GB RAM (CMK64GX4M2D3000C16) | 2x EVGA RTX 2080 Super Hybrid | 2x Intel 660p NVMe | 2x Toshiba SATA HDD | Win10x64


My problem:


TLDR: I can't get memory recognized on slots A1 and A2 of the motherboard and I don't know how to tell if I have a bad mobo or a bad IMC on the CPU. Initially all slots reported RAM but system wasn't stable until it threw a Memory PMU training error after I went from default 2133MHz to 3000MHz (XMP 1) then back to default again.


I have tried multiple sticks of RAM in these slots. All other slots work (and all RAM works in other slots), but with configurations of 2 to 8 DIMMs (all the same RAM from the same batch) A1 and A2 give me "Memory PMU Training error at Socket 0 Channel 2 DIMM 0 & DIMM 1" (when both are occupied) or "Memory PMU Training error at Socket 0 Channel 2 DIMM 1" (when only using slots A2 and B2 per the Memory Configuration page of the motherboard manual for 2 sticks of RAM.



Initially I populated all 8 slots with RAM and benchmarked at 2133MHz. Then when trying to run a SfM benchmark (intended use of this machine) I got an unexpected reboot partway through. Tested the RAM overnight with WMD and came back to a frozen system in windows. Rebooted and event viewer said all the RAM was fine. Loaded XMP profile 1 (3000MHz) and benchmarked great with Passmark (99th percentile,  6778 total, 43468 CPU, 2908 Memory). Tried the SfM benchmark again and got an unexpected reboot partway through again. Reloaded defaults and BIOS finally threw the Memory PMU training error. The system was never stable at 2133MHz or 3000MHz until I got the error and A1 and A2 were disabled. Since they became disabled, I see Memory PMU training error any time a stick is in A1 or A2, and I have never seen any stick of RAM work in them again. 


Since BIOS threw the PMU error I haven't had any system freezes or reboots. I can populate all six other slots of the motherboard and run at 3000MHz (XMP profile 1) for days without an issue. Any time I put RAM in A1 or A2, XMP won't stick, BIOS cycles several times, and memory drops to 2133MHz with PMU error (even if only 2 sticks - in A2 and B2). After giving up on this channel (channel 2 apparently?) I gradually filled RAM and tested at 2133MHz and 3000MHz for C2/D2, C1/D1, and finally B1&B2 and with all configurations I am successfully running at XMP profile 1.


Is it possible to test if it's a bad mobo or IMC without swapping out another one of either (or both)?