cancel
Showing results for 
Search instead for 
Did you mean: 

Drivers & Software

Highlighted
Journeyman III
Journeyman III

Re: gcc segmentation faults on Ryzen / Linux

This sounds interesting. Could you repeat your tests while underclocking your CPU (if there is such an option in your BIOS)?

0 Kudos
Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

ryzennewbie wrote:

I've created and uploaded the USB image to:

https://ufile.io/h1r14

Thanks for providing this. I have no experience with *BSD, so I downloaded this and I'm running it at the moment.

ryzen_provoke_freeze just reboots the machine when it gets to 0x...40, same as everyone elses.

This might be an interesting data point though :

I've been running ryzen_stress_test now for over 11 hours and not had a failure. A looping kernel compile in linux will fail at least once an hour on the same box.

I'll let it run for another day or so and see if it breaks.

Highlighted
Adept II
Adept II

Re: gcc segmentation faults on Ryzen / Linux

Thank you very much for trying out that image.

That freeze during "ryzen_provoke_freeze.sh" is expected - that script pins the program "ryzen_provoke_freeze" to core 0 which seems to be mainly responsible for interrupt managing and therefore "dies" at first. If you run the program "ryzen_provoke_freeze" directly, so that it rotates through all cores, you can be lucky and it will run through. But this behaviour is now circumvented by increasing the "safe zone" towards the top of the memory - see [base] Revision 321899

Getting failures during "ryzen_stress_test" is a hard one, I know - for the FreeBSD devs as well; at the moment, I cannot reproduce that myself after running for 24h. Furthermore, you won't get any segfaults there only "unable to rename" errors, where some object files suddenly disappear. It's still not clear what causes this.

I had good results with compiling "ghc" from the ports tree; first eight failures, then successes; as it needs a warming-up time to succeed:

-------------------------------------------------------------------------------------------------------------------------------

root@capetown2:/root/#cat nohup.out

umount: /tmp/ports.ghc: not a file system root directory

[Wed Aug  9 13:09:09 CEST 2017] building... failed

[Wed Aug  9 13:09:41 CEST 2017] building... failed

[Wed Aug  9 13:10:11 CEST 2017] building... failed

[Wed Aug  9 13:10:41 CEST 2017] building... failed

[Wed Aug  9 13:11:11 CEST 2017] building... failed

[Wed Aug  9 13:11:41 CEST 2017] building... failed

[Wed Aug  9 13:12:29 CEST 2017] building... failed

[Wed Aug  9 13:13:21 CEST 2017] building... failed

[Wed Aug  9 13:14:38 CEST 2017] building... success

[Wed Aug  9 13:43:26 CEST 2017] building... success

[Wed Aug  9 14:12:19 CEST 2017] building... success

[Wed Aug  9 14:41:16 CEST 2017] building... success

[Wed Aug  9 15:10:08 CEST 2017] building...

root@capetown2:/root/work/src/#grep exited /var/log/messages

Aug  9 09:21:00 capetown kernel: pid 59222 (doxygen), uid 0: exited on signal 6 (core dumped)

Aug  9 09:40:32 capetown kernel: pid 60176 (doxygen), uid 0: exited on signal 6 (core dumped)

Aug  9 13:09:41 capetown kernel: pid 6871 (ghc), uid 0: exited on signal 10

Aug  9 13:10:11 capetown kernel: pid 11481 (ghc), uid 0: exited on signal 10

Aug  9 13:10:41 capetown kernel: pid 16079 (ghc), uid 0: exited on signal 10

Aug  9 13:11:11 capetown kernel: pid 20689 (ghc), uid 0: exited on signal 10

Aug  9 13:11:41 capetown kernel: pid 25287 (ghc), uid 0: exited on signal 10

Aug  9 13:12:29 capetown kernel: pid 29885 (ghc), uid 0: exited on signal 10

Aug  9 13:13:22 capetown kernel: pid 34539 (ghc), uid 0: exited on signal 10

Aug  9 13:14:38 capetown kernel: pid 39195 (ghc), uid 0: exited on signal 10

-------------------------------------------------------------------------------------------------------------------------------

but that requires a full-fledged FreeBSD installation that cannot be done on a USB drive easily - at least, I can't do that easily.

So, I'm now trying to fiddle around with compiling GCC, MESA directly and without the ports tree...

Thanks again for testing...

0 Kudos
Highlighted
Elite
Elite

Re: gcc segmentation faults on Ryzen / Linux

bradc wrote:

Thanks for providing this. I have no experience with *BSD, so I downloaded this and I'm running it at the moment.

ryzen_provoke_freeze just reboots the machine when it gets to 0x...40, same as everyone elses.

Is that freeze different from mce freezes caused by waking up from C state sleep, or it's the same thing? And if so, can Linux kernel developers work around it similarly?

0 Kudos
Highlighted
Adept II
Adept II

Re: gcc segmentation faults on Ryzen / Linux

On my system it looks something like this: the execution of the program stops, after 3-5 seconds the screen turns black (no signal) and the system freezes or reboots. If frozen, no pings will be answered, no keyboard inputs executed.

Linux already has that FreeBSD "safe page" patch applied, so I fear that won't be a solution for your problem, I' sorry...

Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

Looking at the ThreadRipper Hi Res IHS shots..

The one Anandtech, PaulsHardware received has UA 1727SUT- maybe chips after week 25 in 2017 had better QA from the factory?   Pretty sure all the press reviewer kits people got similar build dates.

[CPU's received from RMA without issue]

UA1725SUS (mcl00, fujii)

[Marginal CPU's detected in the following batches]


UA1707SUT (reported by apache14)

UA1707PGT (reported by fujii)
UA1716PGT (reported by fujii)

Highlighted
Adept II
Adept II

Re: gcc segmentation faults on Ryzen / Linux

Yep, it seems to become a pattern: mcl00 has a UA 1725SUS; this one you mentioned is UA 1727SUT and supposed to be segfault-free (I call it this way). So, every Ryzen from the first batch(es) is not suitable for compiling - period. Maybe a trainee sneezed into the silicon mixture before it became a chip - whatever, leave the first batch(es) to gamers, give us the 1725++-batches, I say; without any big hurdles in RMA. D'accord?

Highlighted
Adept II
Adept II

Re: gcc segmentation faults on Ryzen / Linux

As a person who is waiting to purchase a Ryzen7 and TR CPUs at home (and perhaps more TRs and possibly EPYC at work) I really hope this ends up being the case. I am going to continue wait a bit..

0 Kudos
Highlighted
Adept II
Adept II

Re: gcc segmentation faults on Ryzen / Linux

Just for curiosity, the Threadrippers you plan to use at work, are they for servers? If yes, do you plan to install them into a server chassis? If yes, how? 3U and water cooling instead of two of the chassis fans? I ask because the TR seems to be nice for ESXi hosts (albeit only 128GB RAM, but leaves 8GB for every single-core VM still)...

0 Kudos
Highlighted
Adept II
Adept II

Re: gcc segmentation faults on Ryzen / Linux

For that these would have to be purchased from Dell (which is the only approved vendor). We would not be using ESXI on any of these. They would be linux workstations for medical imaging research. We currently purchase Dell xeon based 5810 for these.