1 of 1 people found this helpful
I would try using ukuu to install the latest 4.12 series kernel and see if it helps (you can also try 4.13 final, which was released yesterday).
The hardware issue was generally observed on gcc workloads and from what I understand, only caused random segfaults, not the type of behavior you described. But, there was a script made called kill-ryzen.sh to test if you have an affected processor.
Thank you for the help here! I've tried updating to the 4.13 kernel but unfortunately have the same stability issues, will try experimenting with lowering memory speed further to see if that helps at all.
I tried the kill-ryzen script tonight and unfortunately it failed after running for just over an hour so it looks like I did get one with the hardware issue, i.e.
Linux simon-MS-7750 4.13.0-041300-generic #201709031731 SMP Sun Sep 3 21:33:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Using 2 parallel processes
[KERN] -- Logs begin at Mon 2017-09-04 21:16:03 BST. --
<snipped firewall activity>
[KERN] Sep 04 21:20:51 simon-MS-7750 kernel: zram: Added device: zram0
[KERN] Sep 04 21:20:51 simon-MS-7750 kernel: zram0: detected capacity change from 0 to 68719476736
[KERN] Sep 04 21:20:52 simon-MS-7750 kernel: EXT4-fs (zram0): mounted filesystem with ordered data mode. Opts: discard
[KERN] Sep 04 21:20:52 simon-MS-7750 kernel: nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based firewall rule not found. Use the iptables CT target to attach helpers instead.
[loop-0] Mon Sep 4 21:21:32 BST 2017 start 0
[loop-1] Mon Sep 4 21:21:33 BST 2017 start 0
[loop-1] Mon Sep 4 21:55:26 BST 2017 start 1
[loop-0] Mon Sep 4 21:55:27 BST 2017 start 1
[KERN] Sep 04 22:23:22 simon-MS-7750 kernel: show_signal_msg: 31 callbacks suppressed
[KERN] Sep 04 22:23:22 simon-MS-7750 kernel: genautomata: segfault at bd ip 0000000000430210 sp 00007ffc936c3628 error 4 in genautomata[400000+4c000]
[loop-0] Mon Sep 4 22:23:32 BST 2017 build failed
[loop-0] TIME TO FAIL: 3720 s
Running a quick check of /proc/cpuinfo gives me the following info -
model : 1 model name : AMD Ryzen 7 1700 Eight-Core Processor stepping : 1 microcode : 0x8001126
Not sure whether this is a stepping and microcode version known to have the issue but thought I would share the info.
And my BIOS is
DMI: System manufacturer System Product Name/PRIME X370-PRO, BIOS 0810 08/01/2017
As I do use the system for software development I'm now a little concerned that I may run into other issues as a result of the bug, I will check with Amazon to see what options I have.
Thanks again for the help here!
This reeks of bad ram (or ram not seated all the way). Try memtest overnight for 3-4 passes and if you're "checking stability" use stock jedec of 2133, don't just downclock a little.