45 46 47 48 49 992 Replies Latest reply on Aug 22, 2017 1:51 AM by apache14 Go to original post
      • 690. Re: gcc segmentation faults on Ryzen / Linux
        apache14

        So what's the AMD CPU RMA process like for this issue? Do they argue in any cases that the CPU is fine regardless of segfaults?

         

        Im looking at sending mine back and rolling the dice and see if I get a fixed one after getting a Vega GPU. (tho I don't think I kept the box for mine... Doh)

        • 691. Re: gcc segmentation faults on Ryzen / Linux
          ryzennewbie

          > I presume you'd like me to try the patches and test code created by Don Lewis in the FreeBSD bug 219399? i.e. test out executing a return instruction

          > from addresses above 0x7fffffffff3f?

           

          abosultely correct - that's my idea; plus a stress test if you get any "unable to rename" errors during buildworld/buildkernel.

           

          > I haven't got any experience with FreeBSD, but if you point me towards a bootable USB image and some instructions to apply the patches then I'm happy to try it out.

           

          I've created and uploaded the USB image to:

          https://ufile.io/h1r14

           

          It's compressed - in order to get that onto your USB drive, execute something like that:

           

              xzcat FreeBSD-11.1-RELEASE-amd64-memstick.ryzen_test.img.xz | dd of=<PathToUsbDrive> bs=1M

           

          Then boot from it - doesn't matter if UEFI or legacy boot mode. Once it boots through, you should get a login; just enter "root" as username, and then a little instruction will be printed.

           

          Thanks for you effort...

          • 692. Re: gcc segmentation faults on Ryzen / Linux
            bradc

            apache14 wrote:

             

            So what's the AMD CPU RMA process like for this issue? Do they argue in any cases that the CPU is fine regardless of segfaults?

             

            Im looking at sending mine back and rolling the dice and see if I get a fixed one after getting a Vega GPU. (tho I don't think I kept the box for mine... Doh)

            Have you logged a support ticket with AMD? That's where I started. I'm working through a very slow process (I'm in Australia and my support rep is in Canada so at best we get one go-around a day) and at the end if they don't solve my problem I expect I'll be asked to RMA the processor. You won't know unless you are willing to put the effort in to go through the process though.

             

            The real reason I'm doing this is I figure AMD can use all the data points they can get, and if they are willing to give me the level of support they have been then the least I can do is reciprocate. Never know, I might get a working system out of it!

            • 693. Re: gcc segmentation faults on Ryzen / Linux
              Deluxe

              I also suggest going through support first. For me it went as follows:

              1) Opened support ticket & described issue, my configuration and testing (+ linked to this thread)

              2) Support staff asked me for photos of system interior and BIOS settings

              3) They asked me to reset BIOS settings & Set Vsoc to 1.1V and repeat my tests with Vcore 1.36V - 1.41V with 0.05V steps (it's like one step though )

              - I got segfaults for lower voltages and uOpCache MCEs instead for higher ones (repairable parity errors)

              - So even at 1.41V it wasn't properly working

              4) Support suggested RMA and offered to test replacement chip with my MB+RAM configuration...

              - I'm in process of packaging my CPU and sending it off to AMD

              • 694. Re: gcc segmentation faults on Ryzen / Linux
                ryzennewbie

                I apologize, folks; with that "ryzen_stress_test.sh" buildworld/buildkernel script, I made a mistake - the default size of "/tmp" with a read-only USB drive is too small, so every build will fail due to lack of free space.

                 

                In order to fix that, please execute:

                -----------------------------------------------------------------------------------------------  

                umount /tmp

                mount -t tmpfs a /tmp

                -----------------------------------------------------------------------------------------------  

                right after booting from USB drive and right before you execute that script.

                 

                Sorry for that hassle; I'll upload an updated image tomorrow...

                • 695. Re: gcc segmentation faults on Ryzen / Linux
                  runningman

                  It's official. AMD has acknowledged the issue.

                  AMD Confirms Linux Performance Marginality Problem Affecting Some, Doesn't Affect Epyc / TR - Phoronix

                   

                  It'd be nice to see them post here as well. It's RMA time for me.

                  1 of 1 people found this helpful
                  • 696. Re: gcc segmentation faults on Ryzen / Linux
                    amdmatt

                    We have been working closely with a small but important subset of Linux users that have experienced segment faults when running heavy or looping compilations on their Ryzen CPU-based systems. The results of our testing and analysis indicate that segment faults can be caused by memory allocation, system configurations, thermal environments, and system marginality when running looping or heavy Linux compile workloads. The marginality is stimulated with very heavy workloads and when the system environment is not ideal. AMD is working with individual users to diagnose the issues.

                     

                    We are confident that we can help each of you identify the source of the marginality and eliminate the segment faults.  We encourage all of our Linux users who are experiencing segment faults under compile workloads to continue working with AMD Customer Care.  We are committed to solving this issue for all of you.

                    4 of 5 people found this helpful
                    • 697. Re: gcc segmentation faults on Ryzen / Linux
                      apache14

                      Hi matt,

                       

                      Thanks for the update I have just posted an issue using the online cust. service form to try and get my R7 1700 stable (iv had these segfaults and MCE issues from day 0).

                       

                       

                      P.S. your not the same AMDMatt from the OCUK forms by any chance ?

                       

                      Cheers In Advance

                      • 698. Re: gcc segmentation faults on Ryzen / Linux
                        udamanfunks

                        Seems to be a lot of marginal systems out there.    I guess both my Ryzen Systems fall under that category (sarcasm), nothing overclocked - just BIOS defaults.   Based on what's posted, this problem is affecting systems from different motherboard manufacturers (Gigabyte, Asus, Asrock), can't believe AMD is unable to reproduce this internally in their labs.   If AMD is having a hard time finding these *marginal* systems internally, I'd have no problem driving to Sunnyvale and dropping off one of my systems.

                         

                        Temps are not a problem (at least not on my system, my Taichi system has 6 FANS , Prime95 test pegs the CPU at 55c Max, and my A/C is always on - cooling the house at 72 Deg F) , I believe there's a bunch of marginal Ryzen chips out there that shouldn't have passed QA testing, specially if mcl00 got a good one after going through two chips (when nothing else has changed on his system).  This stuff should work without any tweaking.


                        amdmatt - how do we go through the RMA process with AMD and basically get a chip tested just like mcl00 to avoid RMA Silicon Lottery?

                         

                         

                        http://funks.ddns.net:8080/tools/ryzen/RYZEN_TAICHI.jpg

                        http://funks.ddns.net:8080/tools/ryzen/GCC_TAICHI.jpg

                         

                        Asrock Taichi X370 (v3.0 BIOS)

                        R7-1700X
                        Corsair H110i AIO Cooler
                        32 Gigabytes of DDR4
                        Corsair RM650i Power Supply

                         

                        http://funks.ddns.net:8080/tools/ryzen/RYZEN_AB350GAMING3.jpg

                        http://funks.ddns.net:8080/tools/ryzen/GCC_AB350-GAMING3.jpg

                         

                        Gigabyte AB350 Gaming 3 (vF7 BIOS)

                        R7-1700
                        Wraith Spire LED Cooler
                        16 Gigabytes of DDR4
                        Corsair CX550M Power Supply

                         

                        both of them run into the problems completing GCC 7.1.0 build loops while my 6 year old Intel System has gone through 47 compile loops and haven't failed yet

                         

                        http://funks.ddns.net:8080/tools/ryzen/NAS_LOOPS_NOCRASH.PNG

                        1 of 1 people found this helpful
                        • 699. Re: gcc segmentation faults on Ryzen / Linux
                          apache14

                          I agree that there does seem a lot of affected units in the wild (and they are only the ones that have been reported, a lot more that could be affected under Linux will just be running windows with no issues)

                           

                          My 1700 is affected even after up to 1.1v SoC /  up to 1.35v core / every memory configuration under the sun for my setup / every bios available and SMT disabled. And for me its 100% not a thermal issue as I am running a 240mm EK based custom loop that keeps the CPU core very cool under these high stress compile workloads.

                           

                          But AMD are now acknowledging that there is an issue and I hope that means that anyone affected here will end up with a fully working CPU (or at least a setup that will not cause the current CPU to segfault / MCE all over the show)

                           

                           

                          • 700. Re: gcc segmentation faults on Ryzen / Linux
                            pjssilva

                            Same here, I also have a marginal system with good cooling (compiling never gets above 55C, I could make it lower by forcing the fans up, but temperature is not the issue). Everything on stock, never overclocked. From many threads around the net whenever people bothered to try many (I would say more than 50%, easily). The marginality comes only from the fact that the number of people that actually do heavy compilation is small.

                             

                            I really hope that AMD is able to solve this problems in all processors with a microcode update. Otherwise it is gambling, betting that the bug will not appear somewhere else and that people are not going to test their system using kill-ryzen.sh script which is very easy to run.

                             

                            Anyhow, time to start my RMA.

                             

                            Good luck for all. It would be nice that everyone posts here what after after they got the new CPU.

                            • 701. Re: gcc segmentation faults on Ryzen / Linux
                              shmerl

                              amdmatt wrote:

                               

                              The marginality is stimulated with very heavy workloads and when the system environment is not ideal.

                              Can you please provide some examples of common not ideal environments? That would help some to save time and remedy stuff that's possible to fix without replacing the CPU.

                              • 702. Re: gcc segmentation faults on Ryzen / Linux
                                shmerl

                                pjssilva wrote:

                                Good luck for all. It would be nice that everyone posts here what after after they got the new CPU.

                                I'll probably wait for more successful reports, before doing the RMA. Meanwhile I'll try improving thermal paste application for the cooler.

                                • 703. Re: gcc segmentation faults on Ryzen / Linux
                                  raydude

                                  I'm proceeding with an RMA for my R5-1600. I'll let you guys know how it goes.

                                   

                                  We'll see if they can screen the parts yet.

                                   

                                  By the way: I love this system. It's exactly what I need. I'm using 515% (on average) of my CPU and I'm not certain an intel 4 core with hyperthreading would have been enough. I still have headroom to add the features I need to add.

                                   

                                  I REALLY want it to be stable.

                                   

                                  I'll keep working with AMD. I hope they find the actual problem soon. I'm glad they finally acknowledged it. I'm pretty certain the flame wars on Reddit were what got their attention.

                                   

                                  Everyone: Thanks for all your hard work, and please be patient with AMD, they are responding pretty well to our little bit of noise.

                                   

                                  Hopefully they'll be able to take our returned parts and do a proper characterization on them and figure out the difference between working and non-working parts.

                                   

                                  Then they'll be able to bin for the RMAs and design in higher yields.

                                   

                                  By the way: Unless they have shown that it's a packaging issue (something they may never admit), I don't believe them when they say the Threadripper and Epyc are unaffected. We'll see what Phoronix gets from AMD and see if his tests run cleanly. If they do work, then we at least know that AMD can ship working Threadripper and Epyc, which is great for AMD.

                                   

                                  Long live AMD. Seriously.

                                   

                                  And I'm glad you are paying attention to us.

                                   

                                  Brilliant public announcement. Just hope the Threadripper and Epyc denial doesn't come back to bite you on the behind.

                                  45 46 47 48 49