1 2 3 4 5 1,895 Replies Latest reply on Jun 14, 2018 1:36 PM by constantinx Go to original post
      • 30. Re: gcc segmentation faults on Ryzen / Linux
        qsmcomp

        Having random reboot issues with Ryzen 7 1700 when XFR is enabled.

        Disabling Turbo, disabling C6, manually set frequency at 3.2GHz, enabling LLC and increasing core voltage to 1.25v seems to help workaround the issue.

        I have been using "sensors" command with https://github.com/groeck/nct6775 driver to watch the voltage of CPU core on my MSI B350M Mortar motherboard.

        With default BIOS settings the core voltage sometimes goes up to 1.35v but for mostly it is running in 1.09V. Nothing happened.

        With some random changes with BIOS settings the core voltage sticks below 1.19v and never goes up to 1.20v. The problem occurs.

        • 31. Re: gcc segmentation faults on Ryzen / Linux
          qsmcomp

          amdmatt

          my cute Chinese friend 'Alan' lun from AMD's outsourcing suggests that I should ask you for technical support

          Castrating my Ryzen processor is not a good solution.

          1 of 1 people found this helpful
          • 32. Re: gcc segmentation faults on Ryzen / Linux
            mcl00

            I tried the options you listed (no turbo, no C6, 3.2GHz, LLC and increasing the core voltage) and was still able to generate the segfaults while compiling software, so unfortunately that didn't work for me.

            • 33. Re: gcc segmentation faults on Ryzen / Linux
              mcl00

              Quick update - i think my earlier error with prime95 in Windows may have been unrelated to the current issue. It could have been heat-related, or possibly I was still trying out 3200MHz with my memory. In any case, I have not been able to reproduce errors with prime95 (v29.1). Windows 10 (9hrs), Gentoo (3.5+hrs) and Ubuntu (2+hrs) can all run prime95 to their hearts content without errors. (The numbers in brackets are the time I left it to run - in no case did prime95 fail a test). These times running prime95 are all with the CPU at stock settings and the RAM at 2933MHz which is the highest 'stable' speed I can get my memory going.

               

              No amount of fiddling with BIOS settings has resulted in a stable system for compiling software for me. I can also confirm that the problem is not heat related though as my AIO water cooler arrived yesterday and after installing it my CPU temps have dropped by almost 20C with no discernible impact on stability while compiling.

               

              There is still the possibility that this is caused by a bug in Linux, as I have not 'crashed' Windows in the same way. I would like to eliminate that possibility, but I don't know how simulate similar CPU activity under Windows. Windows 10 will run prime95, Passmark's benchmark, the free versions of 3Dmark and others with no problems, but these don't really do the same type of activity as compiling large software packages under Linux which tends to be heavily multithreaded for a 3-4 minutes as a bunch of source files get compiled into object files, and then a short period 20s-1m where it's mostly single-threaded but with lots of memory shuffling around as the object files get linked, then back to multithreaded as it compiles the next batch of source files.

              • 34. Re: gcc segmentation faults on Ryzen / Linux
                qsmcomp

                I had got segfaults with bash and glibc, so I recompiled them with newest version from ubuntu zesty (I'm runnning ubuntu xenial). No segfaults then, but sudden reboot still exists unless raising voltage.

                You might:

                Rasing core voltage as well as NB/SoC voltage.

                Recompiling packages.

                Run memtest86+ and see if your memory kit have problems.

                 

                My mobo has option to set a fixed core voltage instead of dynamic voltage (offset). You should try that option.

                • 35. Re: gcc segmentation faults on Ryzen / Linux
                  qsmcomp

                  QQ图片20170604155313.png

                  I've got this new MCE on my Ryzen 7 1700. lun amdmatt

                  What should I do?

                  • 36. Re: gcc segmentation faults on Ryzen / Linux
                    alfonsor

                    As suggested on phoronix forum, I tried /proc/sys/kernel/randomize_va_space set to 0 and it seems to do the trick for me, please, try it.

                    (Of course it should be considered just as starting point to investigate the iusse)

                    3 of 4 people found this helpful
                    • 37. Re: gcc segmentation faults on Ryzen / Linux
                      sat

                      > As suggested on phoronix forum, I tried /proc/sys/kernel/randomize_va_space set to 0

                      > and it seems to do the trick for me, please, try it. (Of course it should be considered just as starting point to investigate the iusse)

                       

                      In my case, this workaround seems to work. Before trying this, I could reproduce this problem at least once per ten linux kernel build (make -j16).

                      However, after trying this, that build worked fine 100 times without SEGV.

                      2 of 3 people found this helpful
                      • 38. Re: gcc segmentation faults on Ryzen / Linux
                        atomsymbol

                        Just a note about overclocking voltages:

                         

                        MSI X370 SLI Plus BIOS contains a button that overclocks the CPU (Ryzen 5 1600) from 3.2GHz to 3.6GHz and changes the fan envelope. An unexpected issue is that turning the button off does not lower the core voltage (Vcore) back to normal levels, that is back from 1.464 Volt to 1.2 Volt. Normal voltage is restored back to 1.2 Volt by clearing the CMOS.

                         

                        3.2GHz @ 1.464V is unstable (CPU hits 95℃ and gets automatically throttled from 3.2GHz to about 2.7GHz during stress testing in AIDA64), and 3.2GHz @ 1.2V is stable (max CPU temperature during AIDA64 stress test is 76℃). I didn't test 3.6GHz @ 1.464V, but I would expect the system to be unstable at this voltage as well.

                        • 39. Re: gcc segmentation faults on Ryzen / Linux
                          sat

                          FYI about this workaround.

                           

                          a) Its effect

                           

                          > > As suggested on phoronix forum, I tried /proc/sys/kernel/randomize_va_space set to 0

                          > > and it seems to do the trick for me, please, try it. (Of course it should be considered just as starting point to investigate the iusse)

                           

                          > In my case, this workaround seems to work. Before trying this, I could reproduce this problem at least once per ten linux kernel build (make -j16).

                          > However, after trying this, that build worked fine 100 times without SEGV.

                           

                          Unfortunately a person in the Phoronix said echo 0 >/proc/sys/kernel/randomize_va_space, it means

                          disabling ASLR, couldn't bypass this problem.

                           

                          Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads - Phoronix Forums

                           

                          b) Its logic

                           

                          This workaround, disabling ASLR, is based on the following Matt's logic.

                           

                          Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads - Phoronix Forums

                           

                          If it's correct, this problem should disappear after disabling SMT too.

                          However, in my case, it didn't. I'm asking him why he found that logic.

                           

                          Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads - Phoronix Forums

                          2 of 3 people found this helpful
                          • 40. Re: gcc segmentation faults on Ryzen / Linux
                            meihong

                            This workaround doesn't work for me.  I set `/proc/sys/kernel/randomize_va_space` to 0 but still received SEGV once out of 2 trials during compiling gcc 6.3 with -j16.

                            1 of 1 people found this helpful
                            • 41. Re: gcc segmentation faults on Ryzen / Linux
                              nhm

                              Just wanted to chime in that I'm hitting this too on my 1700.  Have never overclocked.  Using 2 DDR4-3200 dimms running at 2100 using the default profile.  Once segfaults start happening the system pretty quickly destabilized.  Haven't had a test to run memtest overnight yet but it definitely fits the profile of what folks are seeing here.

                              • 42. Re: gcc segmentation faults on Ryzen / Linux
                                mcl00

                                I have been doing some extensive testing over the weekend and, for me, disabling ASLR does not resolve the issue.  The only workaround in my case is disabling SMT (contrary to my 1st post, but I think I was dealing with two issues at the beginning - unstable RAM settings and whatever this segfault causing bug is..) I don't know enough about ASLR to know whether this supports or not the issue that Matt described in the Phoronix forum post you linked.

                                1 of 1 people found this helpful
                                • 43. Re: gcc segmentation faults on Ryzen / Linux
                                  whiskey-foxtrot

                                  mcl00 I have a similar experience with this - Two identical installations (Ubuntu 16.04.2), same stock kernel (4.4.0-78-generic) - EXCEPT - 1 has a custom waterloop, the other uses a Noctua air cooler; CPUs are both 1700X, 3200MHz ram on latest beta BIOS from Asus (9945 on Crosshair VI Hero) with 1:1 settings.

                                   

                                  Any compile with -j16 on the air cooled system would KP after a few minutes; no issues on the water cooled system. Added an AIO on the air cooled system and ran a parallel kernel-source compile on both system et Viola - no issues at all.

                                   

                                  It is very possible that the temperature difference made a difference; I'm tempted to switch the AIO back out for the Noctua as a test now that the kernels have been recompiled for these systems instead of using a binary/stock kernel, but I doubt I'll find any issues.  If I do, I'll post here.

                                  • 44. Re: gcc segmentation faults on Ryzen / Linux
                                    mcl00

                                    Speaking of extensive testing, here's an extensive summary. To begin with, I upgraded my system from a Core i5-2500K to the following a few weeks ago:

                                    (new) MSI x370 Gaming Pro Carbon, BIOS 7A32v15 (AGESA 1.0.0.4a)

                                    (new) Ryzen 7 1700

                                    (new) Corsair Vengeance LPX DDR4-3200MHz CL16 2x8GB (CMK16GX4M2B3200C16R) - on the QVL of the MB

                                    Corsair RM650x power supply

                                    (new) Corsair H110i AIO CPU watercooler

                                    Samsung 950pro 250GB and Sandisk Ultra 480GB SSDs

                                    Fractal Design R5 case, 2x140mm case fans

                                    Geforce GTX670 PCIe video card

                                     

                                    The old system was stable (i.e. no segfaults when compiling anything).

                                    All operating systems and software has been re-installed from scratch with the new system.

                                     

                                    Settings/tests done to troubleshoot the Ryzen 1700 system:

                                     

                                    Recent tests were all done with BIOS defaults, Boot mode changed to UEFI only as opposed to UEFI+Legacy, Virtualization enabled (amd-v), memory at XMP profile 1 (2933MHz). This setting passes an overnight run of memtest86 and is stable running prime95 w/16 threads on Windows 10, Ubuntu, and Gentoo for multiple hours without failing. With the maximum power use tests in prime95, CPU temperatures never exceeded 47C, MB temp never exceeded 36C in Windows 10 (I can't currently monitor temps in Linux).

                                     

                                    My typical setup that regularly generates the segfaults is using Gentoo, gcc6.3 (-O2 -pipe -march=znver1), make -j16 and emerging (compiling and installing) the mesa-17.0 package in a loop. My cooler is set on quiet mode, and case fans are controlled by the MB. With those options, I will generate a segfault (usually in /bin/sh) approximately once every 1-4 loops - in other words, I will successfully compile and install mesa 0-3 times before it segfaults.

                                     

                                    The following lists how many loops through compiling and installing mesa were successful before I got a segfault in each of the different scenarios below:

                                    Case and AIO cooler at maximum fan speed: 2 loops

                                    make -j8: 4 loops

                                    RAM at 2133MHz (JEDEC setting, CL15): 1 loop

                                    RAM at 1866MHz, 1.2V, CL18: 2 loops

                                    RAM at 2133MHz, AMD Cool'n'Quiet disabled: 3 loops

                                       as above, plus LLC set to mode 2 for CPU and NB voltage: 2 loops

                                       as above, plus CPU voltage fixed at 1.25V: 3 loops

                                       as above, plus NB voltage fixed at 1.15V: 1 loop

                                       as above, but with LLC set to mode 4 rather than 2 for CPU and NB voltage: 3 loops

                                    Turbo disabled, C6 state disabled, CPU frequency set to 3.2GHz, LLC mode 4, CPU voltage 1.35V, NB voltage 1.15, LLC mode 4: 2 loops

                                       as above, but RAM back to XMP profile 1 (2933MHz, 1.35V): 3 loops

                                    Default settings, RAM 2933MHz, ASLR disabled: 5 loops

                                     

                                    SMT disabled (RAM 2933MHz) and make -j9: 172+ loops (ran overnight without segfault, killed it manually in the morning).

                                     

                                    With SMT disabled (RAM at 2933MHz, everything else default), I recompiled my entire system with gcc set to -02 -pipe -mtune=generic to eliminate any optimizations for the Zen architecture. There were no segfaults during that time. I then re-enabled SMT and tried again.

                                     

                                    make -j16: 7 loops (I started to get excited...)

                                    make -j16, ASLR disabled: 5 loops

                                    Disable 2 cores (3+3): 4 loops

                                    Disable 4 cores (4+0): 6 loops

                                     

                                    Other OSes:

                                    Testing in Ubuntu 16.04, there were too many dependencies/libraries to install for me to test compiling mesa from source under Ubuntu. That said, with default settings and compiling gcc 5.4 from source (with whatever gcc is installed with apt-get install gcc... I think it's 4.5.3) I also get segfaults after anywhere from 5-15 minutes of compiling.

                                     

                                    I know nothing of compiling software in Windows, so I have not been able to test that.

                                    2 of 3 people found this helpful
                                    1 2 3 4 5