cancel
Showing results for 
Search instead for 
Did you mean: 

Drivers & Software

ahartmetz
Adept II

Re: gcc segmentation faults on Ryzen / Linux

Oookay, it seems like the new CPU doesn't like memory at 3200 (which made no difference with the old one). I had run memtest for a few minutes to ensure it's not an obvious memory problem, but now with 2933 MHz memory the faults are indeed gone. I will post again if and when I find out more.

raydude
Elite

Re: gcc segmentation faults on Ryzen / Linux

ahartmetz wrote:

Oookay, it seems like the new CPU doesn't like memory at 3200 (which made no difference with the old one). I had run memtest for a few minutes to ensure it's not an obvious memory problem, but now with 2933 MHz memory the faults are indeed gone. I will post again if and when I find out more.

Testing for stability doesn't count unless it is performed at BIOS defaults. Once you establish that, then you can overclock and try the ram faster.

oldamdfan
Adept II

Re: gcc segmentation faults on Ryzen / Linux

How many sticks of RAM do you have and are they dual or single rank?

I guess it's time once again to post the reminder of the official RAM speeds and not that anything over that is overclocking and dependent on your motherboard/BIOS/CPU/RAM/PSU/cooling:

ddr4-memory-support.jpg

Personally - I think an i7-8700K is a great solution to all of these issues.  AMD has had 5 months of lead here and the have completely squandered it in my mind.

0 Kudos
constantinx
Adept II

Re: gcc segmentation faults on Ryzen / Linux

I7-8700k being a solution, sure, if you like delidding CPUs. I can't get over the fact Intel is using thermal paste instead of soldering the CPUs, to make a further $5 profit.

And the 8700k is quite a lot more expensive: CPU + motherboard + AIO cooler + delidding kit.

On top of that, 8700k is a gaming CPU, and most people in this thread are likely not gamers.

How about better investing in a Threadripper ?

0 Kudos
ahartmetz
Adept II

Re: gcc segmentation faults on Ryzen / Linux

Some of the compiler errors were weirder this time. "cmass is an unknown keyword, did you mean class?" - then shows "defective" code where it actually says "class". Points towards general memory instability rather than the regular compiler segfault. There were also pretty normal looking segfaults. No other programs crashed, only the whole computer when C-State Control was enabled.

The new CPU now passed about 50 compilations of mesa (5-6 hours or so), I guess it can be declared stable with 2933 MHz RAM. I also changed some load line calibration stuff and some more mysterious BIOS "optimizations" (optimize for cinebench, what?) rather towards lower voltages / looser voltage control / don't do anything weird.

Display of SOC voltage is still screwy. It seems to apply deltas on top of the delta set when entering BIOS, only that there's a hidden delta of 200 mV applied at some point(?). I think I have 1.1V SOC voltage in practice now.

I disabled "C-State Control" again because the computer already locked up hard once. I hoped that to be fixed, too. Here I can live with the workaround, it is 100% effective.

Something among the changes made the segfaults go away. Next I'll try going back to 3200 MHz RAM at 1.42V which worked before.

The RAM model is Corsar Vengeance LPX 8 GB two sticks at 3600 MHz, i.e. 16 GB total. SKU is CMK16GX4M2B3600C18. According to my board's QVL it is single rank. The RAM model is rated for 3200 MHz at 1.35V, dual stick, in the QVL, but I found it needing 1.42V to be stable according to overning memtest runs.

The board is an ASUS Prime B350-Plus at latest UEFI version 0902.

Note that I did reset CMOS RAM and test with 2400 MHz / stock voltage before RMA, but set it right back to 3200 / 1.42V when the BIOS nagged me to look at stuff after detecting the new CPU.

raydude

You are right, I know that in theory but "reasonably" (also see text above) had the idea that the new CPU will be at least as good as the old one in almost every way, which is not necessarily true. That idea is from times when only top silicon was free of the glitch, but if the segfault bug has been fixed more specifically, then newer replacements may have the usual quality spread. According to support e-mails, the European RMA center was recently restocked, the CPU batch is UA37SUS (week 37 = mid September), the box looked unopened, and the CPU looked spotless squeaky clean.

It would be nice to hear at from bridgman​​​ or another AMD representative if CPUs really are kown "safe" from a certain date.

AFAIK, so far only guessing from a few data points tells us that there is a manufacturing date when the bug was fixed for good.

oldamdfan

Yeah, I know these (anyway I knew I was overclocking memory wrt the CPU spec), see answers above for why I ignored them.

xbam
Adept I

Re: gcc segmentation faults on Ryzen / Linux

If you're having to push 1.42v for your ram to reach 3200, it's probably not going to work out in the long run, just sayin'.  Been there done that on older PCs.  Perhaps try custom timings and loosen them up a little before hitting the voltage crackpipe.

ahartmetz
Adept II

Re: gcc segmentation faults on Ryzen / Linux

The mprime (Linux equivalent of prime95) "blend" test turns out to be really good at exposing the RAM related instability I'm currently having. Compilation and mprime have no problem at 2933; they both fail (mprime fails faster, just a few minutes) at 3200. So this is a regular RAM stability issue and I'll just run 2933 for the time being.

0 Kudos
bsp2020
Elite

Re: gcc segmentation faults on Ryzen / Linux

I just had another reboot. Looking at Windows Event Viewer, it recorded BugCheck and MCE error. Are MEC/BugCheck error known to be caused by MB issue or processor?

The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000124 (0x0000000000000000, 0xffffd80188ace038, 0x0000000000000000, 0x0000000000000000). A dump was saved in: C:\Windows\Minidump\101117-9281-01.dmp. Report Id: 1646ca94-3df6-4ff3-81af-51b7e68844fc.

A fatal hardware error has occurred.

Reported by component: Processor Core

Error Source: Machine Check Exception

Error Type: Cache Hierarchy Error

Processor APIC ID: 3

The details view of this entry contains further information.

0 Kudos
psiedler
Adept I

Re: gcc segmentation faults on Ryzen / Linux

A friend of mine who had bought an affected R7 1800X got his RMA return CPU this week, production week 37, "SUS".

The new CPU survived a 24-hour gcc compilation test (while the one before usually segfaulted within a few hours). Interestingly, he claims that not easily reproducable crashes of other application software (like with graphics software, browsers etc.) which he used to have about twice per day have also not occured again with the new CPU so far.

0 Kudos
ryzlin
Adept I

Re: gcc segmentation faults on Ryzen / Linux

This is my situation:

UA1708SUT, R5 1600, SEGV and MCE

UA1733PGT, R5 1600, SEGV and MCE

Both CPU are new, no RMA. And I tested them with my RAM at 2133Mhz just to be completely sure that the problem was not my unstable RAM.

Now I sent to AMD the first CPU and I'll report back when I'll receive the replaced one.