cancel
Showing results for 
Search instead for 
Did you mean: 

Drivers & Software

Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

ryzennewbie wrote:


I've created and uploaded the USB image to:

https://ufile.io/h1r14

It's compressed - in order to get that onto your USB drive, execute something like that:

xzcat FreeBSD-11.1-RELEASE-amd64-memstick.ryzen_test.img.xz | dd of=<PathToUsbDrive> bs=1M

Then boot from it - doesn't matter if UEFI or legacy boot mode. Once it boots through, you should get a login; just enter "root" as username, and then a little instruction will be printed.

Thanks for you effort...

Thanks ryzennewbie​ for setting this up. After sorting out the /tmp space problem, I was able to run the stress test for about 14 hours. It never threw any errors, segfault or otherwise. Running the other program that executes code at 0x7ffffffff000 through 0x7fffffffffff always crashes my system when it gets to 0x7fffffffff40. I only tried it 3 times, but each time resulted in an instant reboot.

Highlighted
Adept II
Adept II

Re: gcc segmentation faults on Ryzen / Linux

thanks a ton for testing that image. If you're not getting any failures during "ryzen_stress_test.sh" - meaning all errorcodes are zero and no log files in "/tmp/src"  - makes me somewhat optimistic to really get good silicion after all.

That the other program reboots your system at 0x7fffffffff40 is unpleasant as I hoped that would have gone, too; but also expected.

Anyhow, thanks again for spending your time on that. And just for fun, you can boot that image on a non-Ryzen system (AMD or Intel), execute the other program there and check if that system reboots, too...

0 Kudos
Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

ryzennewbie wrote:

Anyhow, thanks again for spending your time on that. And just for fun, you can boot that image on a non-Ryzen system (AMD or Intel), execute the other program there and check if that system reboots, too...

Already tested on a laptop with an A10-5745M CPU. It runs successfully right through to 0x7fffffffffff. I don't currently have an Intel system to test it on, but will try next time I have access to one.

0 Kudos
Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

RMA'ing my R7-1700 and will see how it works when I get it back.   I fell in this rabbit hole because I couldn't figure out why Ashes of the Singularity - Escalation built in (DX12) bechmark kept crashing on both my systems when SMT was enabled (DX11 was okay).   Some people said it may be because of the NVIDIA Graphics Driver ( GTX 1080ti on one system, GTX 1060 on another system ) but I was running the latest.  I gave up and chalked it up as being a buggy game (but scratched my head as this was a Ryzen showcase game).

If the replacement chip ends up passing the GCC build loop test along with the AOTSSmiley Frustratedingularity DX12 benchmark, then the gamers will definitely be affected as well.  May not be related though.

0 Kudos
Highlighted

Re: gcc segmentation faults on Ryzen / Linux

On my system in order to run AOTS benchmark I need at least 32GB ram and a very high end DX12 GPU but I have it running on my 1800x without failing.

0 Kudos
Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

My main system has 32GB and the secondary has 16GB, I don't believe running the DX12 benchmark requires more than 8 GB of RAM under Windows.   If one of you guys have the compile-bugged Ryzen Processors, have a Pascal based NVIDIA Video Card, and the game - can you guys run the built-in benchmark in DX12 mode and let me know if it crashes to the desktop after it runs for a bit?

0 Kudos
Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

I've also submitted a support ticket for an RMA to get a tested, defect-free, Ryzen chip.

A lot of us have waited patiently to find out what the root cause is, but It's pretty clear that no silver-bullet has been identified for this problem. Everyone has tried many OS versions, settings, and hardware types. Even after extensive testing by the community, we still can't narrow down a deterministic, reproducible, test case. The best we have is "Run a script, put yourself under load, wait for the issue to occur (if it does)". Even AMD support seems to use vague language to try and allude that other factors, in addition to a possible CPU defect, might be causing these errors. Because of the random nature of this, the fact that Epyc and Threadripper chips (according to AMD) don't have this problem, and the fact that only early Ryzen chips do, it seems very much like a silicon quality issue.

To be honest, I'm doubtful if a micro-code update can workaround silicon flakiness. If it were a deterministic bug in the code, I would have suspected that someone could have written a small program to Segfault the CPU on demand already.

Highlighted

Re: gcc segmentation faults on Ryzen / Linux

My Ryzen system has been running kernel 4.12.5 for the last 24 hours. It hasn't run into any crash or error.

The only issues I've run into before were the segfaults. I didn't have MCE errors related to cache or an unstable system. I'm not sure if those segfaults were related to temperature or not.

I'm going to leave it running for a while before I change the settings again to enable frequency boost. I'll change the thermal paste and reseat the CPU in the socket.

I'll go through the RMA once I've figured out if the system is stable and if the segfaults are still an issue.

Highlighted
Elite
Elite

Re: gcc segmentation faults on Ryzen / Linux

I found a new situation which I don't understand.

I disabled cores in BIOS to 1+1 (2 cores / 4 threads). Then I ran 4 builds of libdrm using -j1 in a loop (4 builds were running in parallel). After 30 minutes one segfaulted. The same test wouldn't crash (so fast) if more cores were enabled. With more cores this runs for hours. Also when I pinned processes to individual cores in the past, they were crashing on any core, so the problem is not bound to a specific core. This case can't be called heavy parallel compilation, I would expect this to run just fine on an Atom CPU.

I can reproduce this on two machines with different HW. This issue looks like a race-condition of some sort.

Highlighted
Elite
Elite

Re: gcc segmentation faults on Ryzen / Linux

Can anyone with mce freezes / reboots, post your common thermal graph? You can do it for example using Ksysguard in KDE (by adding sensor for CPU temperature).

Mine looks like this:

vH4Ha3F.png

See these periodic spikes in CPU temperature to around +40°C which are gradually straightened by the fans. Not sure if it should affect mce though, the system looks pretty well cooled to me.

0 Kudos