AMD’s Ryzen 7 has been generally well-received by the enthusiast community, but there’s been one low-level problem that we’ve been watching but haven’t previously reported on. In early June, Ryzen users running Linux began reporting segmentation faults when running multiple concurrent compilation workloads using multiple different versions of GCC. LVVM/Clang was not affected, and the issue appears confined to Linux. Moreover, it wasn’t apparently common, even among Linux users — Michael Larabel, of Phoronix.com, reported that his own test rigs had been absolutely solid, even under heavy workloads.
Like the Pentium FDIV bug of yesteryear, this was a real issue, but one that realistically only impacted a fraction of a fraction of buyers. AMD had previously said it was investigating the problem (which isn’t present on any Epyc or Threadripper CPUs) and it’s now announced a solution: CPU replacement.
Phoronix reports AMD provided them with a new Ryzen 7 1800X CPU and that this chip has refused to crash, even when running a “kill Ryzen” script that would previously deliberately create a compiler segmentation fault. While some users thought the issue was confined to a RAM, motherboard, or BIOS-related issue, Phoronix’s testing proves otherwise. Swap the new Ryzen 7 1800X for an older part, and the problem reappears. Switch back to the new chip, and it vanishes. Larabel has tentatively concluded that the issue appears confined to Ryzen CPUs manufactured before Week 25 of this year (the new chip was built in Week 30), but no other details on what caused it are available.
The good news is, AMD is replacing the CPUs of anyone who has this issue. Again, while the issue is real, it appears to only trigger in an extremely small number of cases when running a Linux workload under specific and particular circumstances.
Your question is a bit misleading. This is NOT caused by a Linux bug, there is no software bug involved in these segfaults. There was a different issue entirely that could be called a bug on Linux and BSD but those have both been fixed an are unrelated to this issue. The Kill Ryzen script also DOES NOT deliberately create a segfault. It is a merely a reasonable way to trigger them on a faulty processor (and also with other faulty hardware). The number of affected chips is unknown, just because there have been few reports doesn't mean few of the chips are faulty (not a lot of people run these kinds of workloads). The faults have also been exhibited running in Windows. Week 30+ processors given in response to RMA requests have mostly (maybe even all) not been affected, but that doesn't tell us anything about a week 30+ processor purchased from a store, we don't know if the chips provided through RMA have been tested to a higher standard/better binned, or if they are all fixed. So far AMD has NOT answered that. I'm not trying to fearmonger, but that's the situation so far. AMD will accept RMA for any affected chips though, so the worst case scenario when buying is the possibility of needing to take some time to do an RMA. I went through the RMA process, it is pretty quick and smooth so long as you have done the proper troubleshooting to isolate this issue.