46 47 48 49 50 1,858 Replies Latest reply on Dec 10, 2017 4:00 PM by kertp Go to original post
      • 696. Re: gcc segmentation faults on Ryzen / Linux
        amdmatt

        We have been working closely with a small but important subset of Linux users that have experienced segment faults when running heavy or looping compilations on their Ryzen CPU-based systems. The results of our testing and analysis indicate that segment faults can be caused by memory allocation, system configurations, thermal environments, and system marginality when running looping or heavy Linux compile workloads. The marginality is stimulated with very heavy workloads and when the system environment is not ideal. AMD is working with individual users to diagnose the issues.

         

        We are confident that we can help each of you identify the source of the marginality and eliminate the segment faults.  We encourage all of our Linux users who are experiencing segment faults under compile workloads to continue working with AMD Customer Care.  We are committed to solving this issue for all of you.

        4 of 5 people found this helpful
        • 697. Re: gcc segmentation faults on Ryzen / Linux
          apache14

          Hi matt,

           

          Thanks for the update I have just posted an issue using the online cust. service form to try and get my R7 1700 stable (iv had these segfaults and MCE issues from day 0).

           

           

          P.S. your not the same AMDMatt from the OCUK forms by any chance ?

           

          Cheers In Advance

          • 698. Re: gcc segmentation faults on Ryzen / Linux
            udamanfunks

            Seems to be a lot of marginal systems out there.    I guess both my Ryzen Systems fall under that category (sarcasm), nothing overclocked - just BIOS defaults.   Based on what's posted, this problem is affecting systems from different motherboard manufacturers (Gigabyte, Asus, Asrock), can't believe AMD is unable to reproduce this internally in their labs.   If AMD is having a hard time finding these *marginal* systems internally, I'd have no problem driving to Sunnyvale and dropping off one of my systems.

             

            Temps are not a problem (at least not on my system, my Taichi system has 6 FANS , Prime95 test pegs the CPU at 55c Max, and my A/C is always on - cooling the house at 72 Deg F) , I believe there's a bunch of marginal Ryzen chips out there that shouldn't have passed QA testing, specially if mcl00 got a good one after going through two chips (when nothing else has changed on his system).  This stuff should work without any tweaking.


            amdmatt - how do we go through the RMA process with AMD and basically get a chip tested just like mcl00 to avoid RMA Silicon Lottery?

             

             

            http://funks.ddns.net:8080/tools/ryzen/RYZEN_TAICHI.jpg

            http://funks.ddns.net:8080/tools/ryzen/GCC_TAICHI.jpg

             

            Asrock Taichi X370 (v3.0 BIOS)

            R7-1700X
            Corsair H110i AIO Cooler
            32 Gigabytes of DDR4
            Corsair RM650i Power Supply

             

            http://funks.ddns.net:8080/tools/ryzen/RYZEN_AB350GAMING3.jpg

            http://funks.ddns.net:8080/tools/ryzen/GCC_AB350-GAMING3.jpg

             

            Gigabyte AB350 Gaming 3 (vF7 BIOS)

            R7-1700
            Wraith Spire LED Cooler
            16 Gigabytes of DDR4
            Corsair CX550M Power Supply

             

            both of them run into the problems completing GCC 7.1.0 build loops while my 6 year old Intel System has gone through 47 compile loops and haven't failed yet

             

            http://funks.ddns.net:8080/tools/ryzen/NAS_LOOPS_NOCRASH.PNG

            1 of 1 people found this helpful
            • 699. Re: gcc segmentation faults on Ryzen / Linux
              apache14

              I agree that there does seem a lot of affected units in the wild (and they are only the ones that have been reported, a lot more that could be affected under Linux will just be running windows with no issues)

               

              My 1700 is affected even after up to 1.1v SoC /  up to 1.35v core / every memory configuration under the sun for my setup / every bios available and SMT disabled. And for me its 100% not a thermal issue as I am running a 240mm EK based custom loop that keeps the CPU core very cool under these high stress compile workloads.

               

              But AMD are now acknowledging that there is an issue and I hope that means that anyone affected here will end up with a fully working CPU (or at least a setup that will not cause the current CPU to segfault / MCE all over the show)

               

               

              • 700. Re: gcc segmentation faults on Ryzen / Linux
                pjssilva

                Same here, I also have a marginal system with good cooling (compiling never gets above 55C, I could make it lower by forcing the fans up, but temperature is not the issue). Everything on stock, never overclocked. From many threads around the net whenever people bothered to try many (I would say more than 50%, easily). The marginality comes only from the fact that the number of people that actually do heavy compilation is small.

                 

                I really hope that AMD is able to solve this problems in all processors with a microcode update. Otherwise it is gambling, betting that the bug will not appear somewhere else and that people are not going to test their system using kill-ryzen.sh script which is very easy to run.

                 

                Anyhow, time to start my RMA.

                 

                Good luck for all. It would be nice that everyone posts here what after after they got the new CPU.

                • 701. Re: gcc segmentation faults on Ryzen / Linux
                  shmerl

                  amdmatt wrote:

                   

                  The marginality is stimulated with very heavy workloads and when the system environment is not ideal.

                  Can you please provide some examples of common not ideal environments? That would help some to save time and remedy stuff that's possible to fix without replacing the CPU.

                  • 702. Re: gcc segmentation faults on Ryzen / Linux
                    shmerl

                    pjssilva wrote:

                    Good luck for all. It would be nice that everyone posts here what after after they got the new CPU.

                    I'll probably wait for more successful reports, before doing the RMA. Meanwhile I'll try improving thermal paste application for the cooler.

                    • 703. Re: gcc segmentation faults on Ryzen / Linux
                      raydude

                      I'm proceeding with an RMA for my R5-1600. I'll let you guys know how it goes.

                       

                      We'll see if they can screen the parts yet.

                       

                      By the way: I love this system. It's exactly what I need. I'm using 515% (on average) of my CPU and I'm not certain an intel 4 core with hyperthreading would have been enough. I still have headroom to add the features I need to add.

                       

                      I REALLY want it to be stable.

                       

                      I'll keep working with AMD. I hope they find the actual problem soon. I'm glad they finally acknowledged it. I'm pretty certain the flame wars on Reddit were what got their attention.

                       

                      Everyone: Thanks for all your hard work, and please be patient with AMD, they are responding pretty well to our little bit of noise.

                       

                      Hopefully they'll be able to take our returned parts and do a proper characterization on them and figure out the difference between working and non-working parts.

                       

                      Then they'll be able to bin for the RMAs and design in higher yields.

                       

                      By the way: Unless they have shown that it's a packaging issue (something they may never admit), I don't believe them when they say the Threadripper and Epyc are unaffected. We'll see what Phoronix gets from AMD and see if his tests run cleanly. If they do work, then we at least know that AMD can ship working Threadripper and Epyc, which is great for AMD.

                       

                      Long live AMD. Seriously.

                       

                      And I'm glad you are paying attention to us.

                       

                      Brilliant public announcement. Just hope the Threadripper and Epyc denial doesn't come back to bite you on the behind.

                      • 704. Re: gcc segmentation faults on Ryzen / Linux
                        runningman

                        amdmatt wrote:

                         

                        We have been working closely with a small but important subset of Linux users that have experienced segment faults when running heavy or looping compilations on their Ryzen CPU-based systems. The results of our testing and analysis indicate that segment faults can be caused by memory allocation, system configurations, thermal environments, and system marginality when running looping or heavy Linux compile workloads. The marginality is stimulated with very heavy workloads and when the system environment is not ideal. AMD is working with individual users to diagnose the issues.

                         

                        We are confident that we can help each of you identify the source of the marginality and eliminate the segment faults. We encourage all of our Linux users who are experiencing segment faults under compile workloads to continue working with AMD Customer Care. We are committed to solving this issue for all of you.

                        I wonder what it takes to be part of the important subset of Linux users. Your support form page doesn't work over HTTPS Email Form

                         

                        Can AMD figure out how to display a form over HTTPS so we can at least ask for help? That'd be great.

                        • 705. Re: gcc segmentation faults on Ryzen / Linux
                          supercom32

                          Following the post of other users, I've set my Vsoc to 1.1v and Vcore to 1.36v to see if that stabilises the issue. Since support recommended trying it, its probably a safe bet that AMD engineers did the same thing in the lab and reported success on some chips.

                           

                          Using 'kill-ryzen.sh' with 6 threads, I was able to run for a few hours before I ran out of disk space. I reduced the number of threads to 5 and started my tests again. Hopefully this won't run out of disk space, but It seems like this might be a viable fix/workaround.

                          • 706. Re: gcc segmentation faults on Ryzen / Linux
                            runningman

                            supercom32 wrote:

                             

                            Following the post of other users, I've set my Vsoc to 1.1v and Vcore to 1.36v to see if that stabilises the issue. Since support recommended trying it, its probably a safe bet that AMD engineers did the same thing in the lab and reported success on some chips.

                             

                            Using 'kill-ryzen.sh' with 6 threads, I was able to run for a few hours before I ran out of disk space. I reduced the number of threads to 5 and started my tests again. Hopefully this won't run out of disk space, but It seems like this might be a viable fix/workaround.

                            This could mean that the silicon quality isn't great for some chips. I really hope this is sorted out soon.

                            • 707. Re: gcc segmentation faults on Ryzen / Linux
                              supercom32

                              A little off topic, but question for those who did the RMA process: Do you need to send them your CPU first, or will they send you a replacement CPU with return packaging/instructions? It would suck if you have to send them your CPU first, since your basically offline for the whole turnaround duration.

                              • 708. Re: gcc segmentation faults on Ryzen / Linux
                                oleyska

                                My vsoc at 1.1 would make the issue worse, from segfault to system freeze.

                                 

                                I am waiting for more info, this is a annoyance rather than critical for me as with many others.

                                Hoping they can solve it through microcode, but if not RMA it is

                                I'll wait for a while, I can live with it, not my first rodeo with cpu bugs (4x intel bugs) in the past.. my first amd and this one got responded to pretty much 6 months quicker despite everyone's fury.

                                 

                                they quickly acknowledged that the problem could be reproduced, and that's good but took a bit too long for a few I guess

                                • 709. Re: gcc segmentation faults on Ryzen / Linux
                                  pjssilva

                                  For me they said that they would send me the new CPU as soon as I sent mine (probably sending them a tracking number or something). In fact this is really bad. I live in Brazil and our customs suck. I am afraid the processor will take ages to get here if it is shipped from abroad. Anyhow, I do not trust my system to do real work. So the sooner I get the RAM CPU, the better. I hope they test it well before sending it to me.

                                   

                                  I filled the RMA form today. I will now wait further contact.

                                  46 47 48 49 50