After fixing the script to work again on ubuntu-18.04 (uncomment the section that installs the build tools in ubuntu). I was able to run it on my Ryzen 2700 / ASUS Prime PRO + 16GB of Crucial DDR4 Unbuffered ECC. After ~36 hours of testing (24 contiguous), all is well (at least when I test from installed ubuntu). I had two lockups when testing from a USB stick. One in kill-ryzen.sh one in memtest86. Not sure why for either. Overall, I am very impressed at this CPU so far. That is as long as the lockups don't happen (my purpose for this system is for a 24/7 fileserver / pvr device that needs to be 100% stable). My thoughts are that it could be that a microcode update is delivered in the installed ubuntu but not delivered in the live usb stick for either. I have not updated the UEFI/BIOS yet and I do see that there have been several updates on that even though my board is very new.
I have not verified that yet. The specification page for the ASUS Prime PRO X470 board says it supports ECC depending on the CPU. I assumed that meant APUs don't have ECC (since other boards mention that). However I did not see any mention of ECC in the manual or the BIOS. memtest86 7.5 did mention that I had ECC but not sure if they were just reading that from the RAM or whether or not it was actually active. I plan to check on that after further testing. I am still running the kill-ryzen script.
I believe the following means ECC is enabled and in Single bit correction / double bit detection mode.
Here is the linux kernel version
jmd1 ~/shell-scripts # uname -a
Linux jmd1.comcast.net 4.16.13-gentoo-20180603-1145-jmd1.comcast.net #3 SMP Sun Jun 3 11:52:55 EDT 2018 x86_64 AMD Ryzen 7 2700 Eight-Core Processor AuthenticAMD GNU/Linux
This tells me ECC is enabled.
jmd1 ~/shell-scripts # dmesg | grep ECC
[ 8.557846] systemd: systemd 238 running in system mode. (+PAM -AUDIT -SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL -XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[ 9.132922] EDAC amd64: Node 0: DRAM ECC enabled.
This tells me there have been 0 errors (I expect that from server experience ECC errors should be rare)
jmd1 ~/shell-scripts # edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
mc0: csrow0: mc#0csrow#0channel#1: 0 Corrected Errors
edac-util: No errors to report.
This tells me the mode of error correction for the first rank
jmd1 ~/shell-scripts # cat /sys/devices/system/edac/mc/mc0/rank0/dimm_edac_mode
Same goes for the 2nd rank
jmd1 ~/shell-scripts # cat /sys/devices/system/edac/mc/mc0/rank1/dimm_edac_mode
Here is what SECDED means
Single bit error correction, Double detection
I have just replaced my Ryzen 1700, bought in August 2017, from batch 1714 PGT .
The new processor is from batch 1738 SUS.
The segfault error seems to be gone, after mild testing. I'll test more, but that batch is likely error free.
I delayed so much until I started the RMA because I didn't have a spare processor, and I really hoped for a microcode fix.
The RMA was very fast, and took about 1 week. I live in Europe, and their local warehouse is in Netherlands.
I made an online service request, warranty category, on Friday, 01 June, in the afternoon.
They replied on Monday, 04 June, in the morning.
A quick exchange of emails followed, and next day, 05 June, they approved the RMA and sent me a DHL account number with free shipping.
I spent the next few days getting a temporary processor.
I sent the processor by DHL on 11 June. They received it on 12th and "test/inspection passed" the same day. On 13th they sent the new processor, and today, 14th of June, I received it.
They were supposed to send a tracking number, but I was simply phone called by the courier 10 minutes before the delivery. Good thing I was at home.
Overall a great experience. Big thanks to AMD!
For me, what was the solution, was to keep the temperature under 70 Celsius, then I got no segfault, when it goes over 70 Celsius I always had segfaults.
It is an Asus laptop ASUS-ROG-Strix-GL702ZC, I could only set the fan in Windows to 80% and then the problem got lost. On auto fan, always breaking the compiling.
Problems with segfaults in GCC persist with Zen version 1 (znver1) AMD CPUs. I'm not sure if this is directly related to the same problem as this old thread here, but with gcc version 10, compilation failures due to stack smashing on Ryzen CPUs abound. There are some details, along with links to other tracking pages and bugs, at https://bugs.gentoo.org/724314 .
This may be a frustrating and long journey with the first generation of Ryzen.