As a software guy, I compile a lot of code, and occasionally gcc crashes with a segmentation fault for no obvious reason. I seem to remember that the problem also manifested as illegal instruction errors sometimes but I'm not sure about that anymore.. I have a Ryzen 1800X CPU and Asus Prime B350-Plus mainboard with UEFI BIOS 0609 (latest). My RAM is on the QVL and running at 3200 MHz but that shouldn't matter.
There is a lot of information in this thread to which I did not contribute: Gentoo Forums :: View topic - Segfaults during compilation on AMD Ryzen.
I'll summarize it: Different people, different gcc versions, different optimization levels, different software compiled, different RAM clocks including very low ones, different Ryzen models and mainboard models, Some of them tried swapping several pieces of hardware to no avail.
I have little to add: I can reproduce the segfaults on Ubuntu 17.04. And nothing else crashes for me after the latest UEFI + AGESA update.
Mean time between crashes is about an hour when compiling continuously.
I think you should try hard to reproduce and fix this at AMD. Compiling anything on Linux with gcc while using all CPU threads should suffice.
Thanks in advance.
[Edited: reordered to be more coherent, removed redundancy]
Thanks for the post, I believe you opened a service request on this same issue. I will respond to your service request so we can continue the discussion there to save duplicating.
It was not me who opened the service request. Somebody in that Gentoo forum thread apparently did, but it's not public and there were no news when I looked. I wanted to ensure that somebody takes care of it because it seems important.
As a workaround, try to disable either SMT or the uOP cache via the CMOS setup of your mainboard. For your workload the latter will probably give you the smaller performance hit, but I don't know whether that specific setup item is exposed by your ASUS board.
In any case, AMD is most probably already aware of potential underlying issues.
In my case (I'm not a original questioner), uop could't change by BIOS and disabling
SMT couldn't resolve this issue. I issued a ticket 8200749112, it's not the Gentoo guy's
ticket which ahartmetz said, and it's not public . But, unfortunately, it hasn't updated recently.
I want to know whether anyone who in AMD handles this issue and how about the progress
(under reproducing, finding the root cause, fixing, and so on), especially the former.
In addition, I think sharing this kind of information in public (here?) is better to prevent
many duplicated private tickets for both AMD and users including me.
Here it is another victim of this problem, with a ryzen 1600. In Gentoo, just two parallel emerge (f.e gcc in a shell and mesa in another shell) with all core used (-j13) trigger the problem, with the compilation that fails suddenly with (usually) the following text in dmesg
segfault at 11 ip 0000000000406215 sp 00007ffcd77a2248 error 6 in bash[400000+ac000]
tried different memory kits with no result.
Hm, it looks like that form only allows input of GPU/driver bugs. Somehow I didn't find a better place to report the issue than this one when I was initially looking.