After fixing the script to work again on ubuntu-18.04 (uncomment the section that installs the build tools in ubuntu). I was able to run it on my Ryzen 2700 / ASUS Prime PRO + 16GB of Crucial DDR4 Unbuffered ECC. After ~36 hours of testing (24 contiguous), all is well (at least when I test from installed ubuntu). I had two lockups when testing from a USB stick. One in kill-ryzen.sh one in memtest86. Not sure why for either. Overall, I am very impressed at this CPU so far. That is as long as the lockups don't happen (my purpose for this system is for a 24/7 fileserver / pvr device that needs to be 100% stable). My thoughts are that it could be that a microcode update is delivered in the installed ubuntu but not delivered in the live usb stick for either. I have not updated the UEFI/BIOS yet and I do see that there have been several updates on that even though my board is very new.
drescherjm wrote:
... ASUS Prime PRO + 16GB of Crucial DDR4 Unbuffered ECC. ...
Does ECC work?
I have not verified that yet. The specification page for the ASUS Prime PRO X470 board says it supports ECC depending on the CPU. I assumed that meant APUs don't have ECC (since other boards mention that). However I did not see any mention of ECC in the manual or the BIOS. memtest86 7.5 did mention that I had ECC but not sure if they were just reading that from the RAM or whether or not it was actually active. I plan to check on that after further testing. I am still running the kill-ryzen script.
Has anyone RMA'd a 1000 series and gotten a 2000 series for replacement?
I believe the following means ECC is enabled and in Single bit correction / double bit detection mode.
Here is the linux kernel version
jmd1 ~/shell-scripts # uname -a
Linux jmd1.comcast.net 4.16.13-gentoo-20180603-1145-jmd1.comcast.net #3 SMP Sun Jun 3 11:52:55 EDT 2018 x86_64 AMD Ryzen 7 2700 Eight-Core Processor AuthenticAMD GNU/Linux
This tells me ECC is enabled.
jmd1 ~/shell-scripts # dmesg | grep ECC
[ 8.557846] systemd[1]: systemd 238 running in system mode. (+PAM -AUDIT -SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL -XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[ 9.132922] EDAC amd64: Node 0: DRAM ECC enabled.
This tells me there have been 0 errors (I expect that from server experience ECC errors should be rare)
jmd1 ~/shell-scripts # edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
mc0: csrow0: mc#0csrow#0channel#1: 0 Corrected Errors
edac-util: No errors to report.
This tells me the mode of error correction for the first rank
jmd1 ~/shell-scripts # cat /sys/devices/system/edac/mc/mc0/rank0/dimm_edac_mode
SECDED
Same goes for the 2nd rank
jmd1 ~/shell-scripts # cat /sys/devices/system/edac/mc/mc0/rank1/dimm_edac_mode
SECDED
Here is what SECDED means
EDAC_SECDED
Single bit error correction, Double detection
I have just replaced my Ryzen 1700, bought in August 2017, from batch 1714 PGT .
The new processor is from batch 1738 SUS.
The segfault error seems to be gone, after mild testing. I'll test more, but that batch is likely error free.
I delayed so much until I started the RMA because I didn't have a spare processor, and I really hoped for a microcode fix.
The RMA was very fast, and took about 1 week. I live in Europe, and their local warehouse is in Netherlands.
----------
I made an online service request, warranty category, on Friday, 01 June, in the afternoon.
They replied on Monday, 04 June, in the morning.
A quick exchange of emails followed, and next day, 05 June, they approved the RMA and sent me a DHL account number with free shipping.
I spent the next few days getting a temporary processor.
I sent the processor by DHL on 11 June. They received it on 12th and "test/inspection passed" the same day. On 13th they sent the new processor, and today, 14th of June, I received it.
They were supposed to send a tracking number, but I was simply phone called by the courier 10 minutes before the delivery. Good thing I was at home.
----------
Overall a great experience. Big thanks to AMD!
For me, what was the solution, was to keep the temperature under 70 Celsius, then I got no segfault, when it goes over 70 Celsius I always had segfaults.
It is an Asus laptop ASUS-ROG-Strix-GL702ZC, I could only set the fan in Windows to 80% and then the problem got lost. On auto fan, always breaking the compiling.