cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

pokester
MVP

Re: Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

Are you running the latest bios for you board. Many boards have gotten much better with updates. For instance I could not run 4 sticks in my B450 board and finally a Bios update fixed this. Sometimes the latest bios can be the issue too and regressing one might help. If you regress however make sure that bios supports the cpu you have. Worth trying if you have not. Saves you from swapping out CPU's with an RMA if it helps. 

My apologies if you already did this, I didn't see it mention above if you did. 

misterj
Exemplar

Re: Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

Thanks, pokester.  I can only speak for me and my system ran for several weeks with no issues both SPD and XMP (3200MHz).  Then this hit the fan.  My 2990WX ran for months and even ran for two weeks after an MB RMA before showing this error.  Since 39xxXs are new with a new Chip Set and socket, there is no back BIOS as far as I know.  Gigabyte did release a new BIOS that truly killed my system so I went back but will go forward to the latest soon.  Thanks and enjoy, John.

pokester
MVP

Re: Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

Sorry to hear that. That really sucks. Being an early adopter of new stuff runs those risks. Not that it is a reasonable risk and certainly you should not have to suffer because of it. If they can't offer help request they send you maybe a different board. I hate to pick on Gigabyte but I had my fill of their boards not working in recent years. Issues I never have with Asus and MSi. 

Anyway I hope you get it resolved with too much more aggravation and spending more money. This stuff should work right when released IMHO. 

0 Likes
misterj
Exemplar

Re: Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

I mostly agree, pokester.  This is my first experience with Gigabyte.  MSI, Asrock and a few older ones have all angered me.  I have done 3 MB replacements on this system and will probably not do another one.  Almost all my past RMAs have been MBs and I am tired.  I have opened a support ticket with AMD.  I'll see what happens.  Thanks and enjoy, John.

pokester
MVP

Re: Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

No doubt you get lemons and at every generation who makes the good boards changes. That is why I usually don't adopt quickly and choose my purchase by reading reviews and picking the parts that say they work best with what I am buying. However at this point bios fixes should have things working unless they made a bad board to begin with or you just have a lemon. I feel your pain. 

0 Likes
misterj
Exemplar

Re: Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

riveryeti, I have received a couple of responses for AMD (ticket to Support).  The first simple asked if I had tested my 3970X in another MB (have not and not really feasible).  I ask for a response to my question about upping SOC voltage.  They responded that this was overclocking and would avoid (void?) my warranty.  I responded by asking for an answer to my last question: "Are you seeing lots of these problems?"  Hope to get an answer.  Certainly will not change my SOC voltage.  Enjoy, John.

0 Likes
hardcoregames_
Big Boss

Re: Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

I have stuck with MSI as they at least have some staff on their forum who can get your RMA approved if needed.

I have looked at TR memory for quite a while as I have also had memory issues. 

I am aware that 4 layer motherboards perform poorly compared to more expensive 6 layer boards. Probably the reason my X570 was more expensive than the X470.

0 Likes
ledhed
Adept II

Re: Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

This is precisely what I was going to add to his question. I too have seen this exact kind of error and issue, but it was a little ways back on a much less powerful machine (ASRock Z77 Extreme6 + 3770K). I ended up finding out I had a bent pin on the motherboard socket, this was causing an issue with DIMM #3. When I put any RAM in that slot, the motherboard wouldn't boot, but it would OC the other sticks if you only filled three of them!

Myself and two other highly efficient technicians, whom I know, all tried a few times to fix the 5-6 bent pins. Nobody could ever get the system to work properly with all 4 DIMMs filled. I think it's worth checking the CPU's pins, but aren't the pins hidden on the motherboards socket now? That is how my X570-E is with my 3950X.

You can find videos of people like LTT trying (and eventually), fixing bent pins on CPUs. LTT even adds donor pins from a spare CPU, something I actually found impressive (doesn't happen often with him). Never let anyone tell you that it's easy to do, though, it almost comes down to luck with CPU/Mobo pins. 

Companies don't intend anyone to fix anything by hand (for the most part) when we're talking about surface mount components. I deal with companies like Burson Audio, Orange Amplifications and I know the owner of Sparkos Labs (all make op-amps). Anytime something has gone bad, they tell me not to even try fixing them, haha. That is surface mount parts, dealing with micro anything is purely machine. The pins aren't really part of that, but they still are the physical I/O ports for the entire CPU. 

All I can say is good luck my friend! If you can get one replaced on warranty, do that. 

0 Likes
drdocumentum
Adept I

Re: Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

Hi, did you finally solved your problem? I have exactly the same. Mobo / CPU and memory slots A1 & A2 failing PMU training. I have been two times on the local dealer from whom I purchased and no luck so far to getting the MB or CPU exchanged. I actually left them on their lab today to see if they will honor the warranty.

My memory is fine I tested it with memtest86 with zero errors. And those memories where installed on a first gen TR + Asus board also working fine.

Dealer said it is due to dirt/thermal paste on the CPU "pins" however he cleaned it and worked with one memory. I got it back to the case, installed all the memory again and it worked for a few days but started to fail again allways on the same A1 and A2 memory slots. I purchased a second MB from Gigabyte Aorus brand online and waiting for delivery to rule out CPU vs MB defect. After that I will have to fight with the local dealer to get a refund on the defective component I think.

I have been assembling my own PCs since year 1995. And I believe this is the first time I see something like this happen.

mantisman13
Adept II

Re: Bad memory channel - how to test if mobo or CPU IMC (TR 3960x)?

Have basically the exact same issues.

I've been having major issues that seem to be related to the A1 and A2 slots on my TRX40 Taichi with a AMD 3970x and 256G kit of CORSAIR Vengeance RGB Pro CMW256GX4M8E3200C16. Microsoft Windows 10 (10.0) Pro for Workstations 64-bit (Build 18363)

Issues started after running hard for 2 weeks running folding@home with 2 cpu clients (32 thread and 24 thread) and a gpu client using AMD rx5700xt slight overclocked. No over clocking on the CPU or Memory and Thermals all were well handled by case cooling and AIO. I first noticed the issue with F@H when it crashed over night and after that it would BSOD after tying to start folding again. This was a day after a Microsoft updated and also installing node js and vuejs development packages. I original suspected software or driver conflicts, so I made sure to update AMD chips set to amd_software_2.04.04.111. This didn't help. I then also discovered running Cinebench r20 would cause BOSD as would CPU-Z bench or Stress. The BOSD's were a verity of messages. MEMORY_MANAGEMENT,IRQL_NOT_LESS_OR_EQUAL, PAGE_FAULT_IN_NONPAGED_AREA etc., but all pointed to crash address of ntoskrnl.exe+1c2390 when the minidumps were viewed with BlueScreenView. I started to suspect memory when I noticed I was running 32G low. Going into BIOS I found that A2 slot was not showing up. I also was having issues getting in and out of BIOS as the usb wireless keyboard was not working most of the time when the system would come back up to the post screen. I had to clear the CMOS and full power down and back a couple times to get back into BIOS and set things up again. Sometimes all 8 slots would show fine and I could get it going again a would get back into window. At one of these I found going into the iCue software there was a Firmware update for the Ram and I ran that. After that I had a XMP profile that I could chose from in the bios that I don't think had been there before and setting that initial seems to help. But it would still crash and I would get back into BIOS and see empty slot A1, A2. I then created a USB boot for MemoryTest86 and started running tests with isolated ram. I tested only B2,A2 without issue. B1,A1 no issue. All memory was testing fine and I booted up into window with B2,A2 but I think i had a crash and start to test the memory again. I spent almost 2 days running memory tests and found no memory errors. My last test was back to the full 8 chips loaded and all seemed fine with the memory test. I then tried to boot up into Windows, but had issues with the BIOS freezing up on me a few times and also moving from English into kanji langue and freezing. I then flashed it to BIOS 1.6 and brought it back up and reconfigured. Raid options where now showing again (had been missing in 1.1) I have 2 raided NVME 1T drives and 8 SATA drives for 22T Raid10 using the AMD raid drivers. It seems to take a couple cycles to going in and out of the BIOS to get the raid to hook back up so that windows boot manage would could boot. But along with that I stated seeing a warning flash "Memory PMU Training Error at Socket 0 Channel 2 Dimm 0" and if I would go into the BIOS both A1 and A2 would not show up. After removing both from the system so that I have 2 Channels of Tri-channel memory on B2,C2,D2 and B1,C1,D1 I seem to be error free. All benchmark and stress test are running with out issue and I'm folding with the CPU and GPU all at 90% as I wright this.

I should also mention I had uninstalled F@H, vuejs and other recent installs to no avail. I have not reinstalled node or vue yet.

I did another test today where I swapped out the memory that has been running fine in B1 and B2 slots for the Mem I removed from the A1, A2 slots and those worked just fine and I've been folding on them all day. I also then tried putting the memory from B slots back into A slots to try to get back to 256 quad memory but I could not even get to the post screen on 2 restarts. Both times I got a 0d error on the board. I did a full power down on the PSU and tried to boot and this time got to the ASRock screen, but it would not respond to the keyboard and did not try to load window boot manager. Powered down, removed the 2 chips in A's and it started up and boot right up without issue. So I really need to know is this a problem with the board, with the cpu or still some driver/bios thing. How to test?

Did yet another test where I tryed D1,D2,C1,C2,A1,A2. This booted up, but I got a BSOD after just a few mins with no real load.

So like described in this thread by others, all of my memory chips work fine so long as they are not in an A banks.

One thing that the talk about cpu pins makes me consider is that the Threadripper chips do not have pins. They have little contact spots and there are spring pins on the socket. The CPU and the cooler get torqued down. I seem to recall watching a LTT video where he had an problem getting a system to post with dual XENON cpu and the fix came down to getting the right mounting pressure. I wonder with running at the hotter range for a couple weeks none stop if my cooler cpu torque pressure has changed and giving me this odd issue. I'm going to try backing off the torque and then retorquing with the cpu tool and test again when I get to my next shutdown opportunity. If that doesn't work, then pulling it out, cleaning and remounting. I wonder if it would be so simple. I know when I do hot laps on the track I need to retorque my wheel lugs. Thermals expansion is real a thing.

That's all I've got at this point aside from bad MOBO or CPU and like every one else, I'll have to get either another one of each to test things out. It would be great if there were some test kit we could boot off a USB drive that would be able to give a direct answer.

John Glassman
MantisMan LLC