4 Replies Latest reply on Jan 1, 2018 8:17 PM by hikaru

    Ryzen 1600 segmentation fault issue

    hikaru

      Hi. I have a ryzen 1600 with six cores running at stock clocks (3200 mhz)

       

      It is installed in an asus prime b350 plus motherboard.

       

      The ram is two sticks of 8GB each of gskill trident Z ram, F4-3200C14D-16GTZSK

       

      I am running tests on gentoo linux. Before moving the distribution to this machine it was deliberately recompiled with no -arch setting so it would simply run on anything. Windows 10 is installed on this system with EFI support, so if there are configuration changes you suggest that will apply *from* there, I can do that too.

       

      Absolutely nothing I have tried seems to be able to make this cpu able to survive kill-ryzen.sh GitHub - suaefar/ryzen-test: Tools to reproduce randomly crashing processes under load on AMD Ryzen processors on Linux

       

      Failures have happened anywhere between seconds to at most 15 hours, and generally take the form of either opcode or segmentation fault errors.

       

      Things I have tried:

       

      I updated the BIOS software before I even began testing to the latest update.

       

      I tested if the processor was overheating by using the prime95 stress test in windows 10 while monitoring the temperatures in the asus software. It ran for hours without a problem.

       

      I tried disabling processor C-states in BIOS, and reverted because it made no difference.

       

      I tried using the XMP memory profile for the ram and reverted because it made no difference.

       

      I tested the ram using the free efi enabled memtest86 overnight, and 9 passes were successful with no errors.

       

      I tried disabling ASLR in linux via echo 0 > /proc/sys/kernel/randomize_va_space

      This actually produced a meaningful difference, and the test lasted longest - 15 hours - without failing, longer than any other I tried - but it still failed, so I reverted.

       

      I am currently doing the last thing I can think of, and have fully disabled SMT in the BIOS and am running the test as I write this.

      I do *not* want to disable SMT longterm, but if it makes the processor stable in the short term at least I can live with it until I find a better solution.

      EDIT: Disabling smt did not produce useful results - I ran the test overnight and awoke to the machine hardlocked and unresponsive.

       

      There does not seem to be a way to disable opcache from within the asus bios, so I don't think I can do that.

       

      Does anyone have any suggestions I have not already tried?

       

      I do not know what else I can do at this point other than initiate an RMA request, and would prefer to discover I have overlooked something obvious.

       

      Message was edited by: Timothy McGrath