51 Replies Latest reply on Aug 13, 2018 4:07 AM by shinobi

    Ryzen linux kernel bug 196683 - Random Soft Lockup

    imshalla

      I have Ryzen 7 1800X, ASUS Prime X370-PRO, running Fedora 26 and 27.

       

      The damn thing has not worked properly since I bought it.

       

      My first CPU was RMA'd, and the replacement does not appear to suffer from the SEGV fault.

       

      However, both the original and the replacement CPU both crash regularly:

       

        * occasionally with streams of "watchdog: BUG: soft lockup" events being logged,

       

        * but mostly the system just stops and I can find no logging that tells me why.

       

      At bugzilla.kernel.org I find bug 196683, where a "workaround" is suggested:

       

      1) kernel configured:  CONFIG_RCU_NOCB_CPU=y

       

      2) kernel command-line:  rcu-nocbs=0-15

       

      But with kernel 4.14.18-300.fc27 I find the machine has stopped over night (when it is idle), every two or three days.

       

      I have added kernel command-line "processor.max_cstate=5", which may help with the crashes, but (I assume) not with the electricity bill :-(

       

      Does anybody understand what the real fault is ?  A "workaround" is all very well, but not entirely satisfactory.  It's not as if this is a new device any more.

        • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
          shmerl

          I was bitten by the same issue on Linux all the way from the time I bought Ryzen CPU (Ryzen 7 1700X). First I had this one and segfault bug. I RMA'd the chip, and segfault was gone, but random freezes / reboots never went away really. I blamed it first on motherboard, RAM and even power spikes, but nothing helped and now I found that kernel bug opened:

           

          196683 – Random Soft Lockup on new Ryzen build

           

          Some commented there, that AMD know about the issue and plan to fix it with microcode update. However it wasn't fixed in microcode 0x08001136 as some reported. And my motherboard (Asrock X370 Taichi) still ships even older one: 0x08001129.

           

          Are you still planning to fix it, or the only way to do it is to replace the CPU with Ryzen 2?

          1 of 1 people found this helpful
          • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
            imshalla

            For completeness...

             

            ...one of the theories is that disabling cstate 6 contributes to preventing random freezes.  My experience is of finding the machine frozen in the morning, when it has been idle overnight.  So some problem with the "Deep Power Down" state seems probable.

             

            ...however, the kernel-command-line option ""processor.max_cstate=5", does not appear to disable (cstate) C6 :-(  I am unable to determine what (if anything) it does do.

             

            ...but at ZenStates-Linux there is 'zenstates.py' which will disable C6.  And (with any luck) that may do the trick.

             

            It is, of course, disappointing that my shiny Ryzen CPU is unreliable.  But it is *infuriating* that nobody seems to know what the problem is, or whether some or all of the suggested spells are required to avoid these apparently random freezes.

            • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
              shmerl

              Is there a point to open a support ticket for this?

                • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                  imshalla

                  I have sent a message via <http://support.amd.com/en-us/contact/email-form>, referring to this thread and the Kernel Bug.  [Service Request: {ticketno:[8200794428]}]

                   

                  I assume AMD are already aware of the Kernel Bug... but it would be nice to hear whether a BIOS and/or microcode fix is coming, or whether the Linux Kernel folk should be persuaded to incorporate a permanent "do not use C6" for Ryzen CPUs (assuming that's really the answer !).

                    • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                      shmerl

                      I most likely will just buy Ryzen 2 this spring, since I don't think the fix is coming any time soon. But time will tell.

                       

                      I also might end up building the kernel with CONFIG_RCU_NOCB_CPU=Y.

                      • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                        shmerl

                        Just built the kernel with CONFIG_RCU_NOCB_CPU=y, and running it OK so far (kernel boot parameter: rcu_nocbs=0-15). CPU temperature is slightly higher than before (I assume it does less of core parking or may be some other thing is affected).

                         

                        But that's what I'd expect, the freeze happens because of insufficient power delivery in some of the C6 states. So somehow rcu_nocbs=0-15 increases that power (while C6 states are still enabled), so this should raise the temperature somewhat.

                        • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                          imshalla

                          I have now heard back from TECH.SUPPORT@AMD.COM.  What they said, and what I have said in reply is shown below.

                           

                          I can today say that my machine has not crashed in the last 7 days.  I cannot say how disappointed I am that this is cause for celebration :-(

                                                                                                                                                                   

                          On 28/02/18 10:51, TECH.SUPPORT@AMD.COM wrote:

                          Dear Chris,

                          Your service request : SR #{ticketno:[8200794428]} has been reviewed and updated.

                          Response and Service Request History:

                          Thank you for the response.

                          I understand you are experiencing an issue on your PC with Ryzen processor when C6 state is enabled on the BIOS.

                           

                          Yes, my PC freezes at random when it is idle.  Typically it will freeze when left overnight, roughly every 2 or 3 days.

                           

                          Having disabled C6 -- using the 'zenstates.py' script, from https://github.com/r4m0n/ZenStates-Linux -- my machine has not frozen for 7 days.

                           

                          This issue has been fixed with the latest BIOS updates, but the option to fix it may not be available in all BIOS.

                           

                          What is the root cause of the issue ?

                           

                          In what way has it been fixed ?

                           

                          I request you to update to the latest BIOS and see if you have the Power Supply Control option in the MB BIOS. Try toggling this option between the different settings to see if it fixes it. If the specific option is not available I would suggest you keep C6 off for now.

                           

                          I have the latest available "PRIME X370-PRO BIOS 3803" from ASUS.  That apparently includes:

                           

                            2.Update to AGESA 1000a for new upcoming processors

                           

                          I understand that means AGESA 1.0.0.0a (?) -- I have no idea what that means, since AMD seem to keep the release notes for AGESA as a deep, dark secret.  A previous BIOSes had (according to ASUS) "AGESA 1071", and before that were "AGESA 1.0.0.6B" and "AGESA 1.0.0.6a"... so I admit to being baffled.

                           

                          What does this new BIOS "Power Supply Option" do ?

                           

                          Are you telling me that this is problem with my power supply ?

                           

                          If so, does this mean I need a better power supply ?

                           

                          Disabling C6 is not really a long term solution... since that disables both (a) the maximum single core performance, and (b) the minimum power consumption state.  While these are arguably marginal, I have wasted a lot of time and energy trying to get my machine to work reliably.

                           

                          I am seriously disappointed that the only information available is buried in kernel bug report(s) and in various support forums.

                           

                          Having (eventually) found Linux Kernel Bug 196683, I have been hoping that AMD would leap into action to: (a) inject some clarity into the discussion, and (b) provide a proper solution.

                           

                          <sigh>

                           

                          For completeness, let me repeat my questions:

                           

                            1) What is the root cause of the issue ?

                           

                            2) In what way has it been fixed ?

                           

                            3) What does this new BIOS "Power Supply Option" do ?

                           

                            4) Are you telling me that this is problem with my power supply ?

                           

                            5) If so, does this mean I need a better power supply ?

                           

                          Thanks,

                           

                          Chris 

                          Best regards,

                           

                          HK

                           

                          AMD Global Customer Care

                          _____________________________________________________________________________________________

                          The contents of this message are provided for informational purposes only.  AMD makes no representation or warranties with respect to the accuracy of the contents of the information provided, and reserves the right to change such information at any time, with or without notice.

                           

                            • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                              shmerl

                              > Update to AGESA 1000a for new upcoming processors

                               

                              That's what I see also for my ASRock X370 Taichi (except no version is specified): Update AGESA for future coming processors.

                               

                              It didn't even update the microcode it seems. And where is that option for power supplies exactly? Let's hope some future update will actually fix it, but for now building the custom kernel and not disabling C6 looks like the best possible workaround.

                              • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                shmerl

                                Can you please also ask AMD, what exact AGESA version is providing the fix?

                                • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                  imshalla

                                  I received this from "AMD Support" on 01-Mar-2018 (at 11:50):

                                  _______________________________________________________________________

                                    This is an automatically generated email please do not reply

                                   

                                    Dear Hall,

                                   

                                    Your service request SR# 8200794428 has been escalated to one of our

                                    experts who can better address your questions. We appreciate your

                                    patience, and thank you for your interest in AMD.

                                   

                                    Best Regards,

                                   

                                    AMD Global Customer Care.

                                  ____________________________________________________________________

                                   

                                  But have not heard anything more, yet.

                                   

                                  On a brighter note, I have now ~12 days without a freeze.  I am now going to turn C6 state back on again.

                                   

                                  Chris

                                • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                  imshalla

                                  "AMD Support" said they had escalated the issue on 01-Mar-2018, it is now 12-Mar-2018... <sigh>.

                                   

                                  Having run for ~12 days with C6 disabled, I re-enabled it, and my machine crashed twice in 3 days.

                                   

                                  I have now replaced the power-supply, by something newer, more efficient and which claims:

                                   

                                     "Zero-Load design that supports Intel’s Deep Power Down C6 and C7 modes".

                                   

                                  With C6 enabled, I rather expect the machine to crash again any day now.

                              • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                shmerl

                                Browsing around my ASRock X370 Taichi firmware settings, I found this one:

                                 

                                Advanced > AMD CBS > Zen Common Options > Power Supply Idle Control.

                                 

                                I changed it from auto to low, let's see if it will help with stock kernel.

                                • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                  shmerl

                                  I'm now also seeing a lot of these in dmesg:

                                   

                                  [11225.078807] x86: Booting SMP configuration:

                                  [11225.078808] smpboot: Booting Node 0 Processor 1 APIC 0x1

                                  [11225.081035] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)

                                  [11225.081063]  cache: parent cpu1 should not be sleeping

                                  [11225.081127] microcode: CPU1: patch_level=0x08001129

                                  [11225.081213] CPU1 is up

                                  [11225.081233] smpboot: Booting Node 0 Processor 2 APIC 0x2

                                   

                                  And so on for all 16 virtual cores.

                                  • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                    shmerl

                                    That C-state 0x0 not supported by HW happens now always, so it's not related to my test above.

                                     

                                    With Advanced > AMD CBS > Zen Common Options > Power Supply Idle Control set to "Common current idle" (instead of auto) in my ASRock X370 Taichi firmware, I didn't get any freezes in a while, so I assume it's a valid workaround.

                                     

                                    I noticed what changes after it's set in the firmware, using zenstates.py:

                                     

                                    When set to auto:

                                     

                                    C6 State - Package - Enabled

                                    C6 State - Core - Enabled

                                     

                                    when set to Common current idle:

                                     

                                    C6 State - Package - Disabled

                                    C6 State - Core - Enabled

                                     

                                    So apparently it disables package C6 state (while keeping core C6 state enabled)! Hopefully it can shed some light on what the problem is. I wonder if Ryzen 2 will be free of this issue.

                                     

                                    What exactly is "package" in this context? Is it still part of CPU, or it's something on the motherboard?

                                    • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                      daclown

                                      This thread is consistent with my experiences. I also note that this problem is more frequently triggered by having both Firefox and Steam open at the same time.
                                      Given that syslog loses the last few minutes of logging leading into the soft lockup, I expect we're looking at some kind of runaway condition where something is getting stuck in a loop somewhere. My Softlocks are often preceded by CPU/GPU panics, but they are not always.

                                      • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                        john1000

                                        I've been watching this post hoping more information would come to light since I want to build a Ryzen based linux system (I already have a windows one.)  Last Sunday, I decided to start a test.  I built a bare bones Ryzen 1300x system with 4 gigs and Asus X370 Pro.  All the other components are from my old Amd 945 - power supply, case, storage, fans, and nvidia GTX 660 graphics card.

                                         

                                        I updated the BIOS to the latest, installed Ubuntu Mate 16.04.3 and did all the updates.  I then installed the supported nvidia proprietary drivers.  It's been running without problems so far.  I want to let it run for two weeks without issue before I move forward and buy 64gb of RAM.

                                         

                                        I tried this last year and ran into trouble and I describe it here:  Linux Reboots during idle time and MCE errors (Ubuntu 16.04.3.) | johnstechpages.com

                                          • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                            imshalla

                                            What you describe is the Kernel Magic where the:

                                             

                                            1) kernel is configured:  CONFIG_RCU_NOCB_CPU=y

                                             

                                            2) kernel command-line is used:  rcu-nocbs=0-15 (depending on the CPU)

                                             

                                            This appears to help, but is not a guaranteed fix.

                                             

                                            AMD have told me that a PSU which will deliver 12V at 0A is required.

                                             

                                            I (now) have one of those, but my machine froze again overnight.

                                             

                                            There is supposed to be a BIOS fix -- "Power Supply Idle Control" -- but my ASUS Prime X370-PRO does not support it (yet).

                                             

                                            The BIOS fix appears to disable the C6 (deep sleep) state, either (a) for the "package" or (b) for all cores.  I assume that by "package" we mean that C6 is enabled for all but one of the cores.  Some people have reported success with these options.

                                             

                                            FWIW: I am now trying with C6 disabled for the "package".

                                             

                                            I have been hoping for more information from AMD.  I last wrote to AMD "TECH.SUPPORT" on 14-Mar... so far, no reply.

                                              • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                john1000

                                                My test system has been running for a little more than a week now, mostly idle with no crashes.  The main differences from last year when I conducted this test are:

                                                 

                                                1)  Different CPU - an 1300x rather than 1800x.

                                                2)  I've populated only 1 dimm socket with 4 GB of ram rather than 4 sockets with 64 GB.

                                                3)  I'm using a GTX-660 video card rather than a Nvidia 8400 GS.

                                                4)  I'm using the nvidia proprietary driver this time around rather than the open source driver.

                                                5)  Updated BIOS and Ubuntu 16.04.

                                                6)  I'm using a Cooler Master RS 850 Power Supply I purchased in December of 2009 with 8 years of continuous use rather than a new Seasonic Prime 750.

                                                 

                                                I have not done any of the kernel configurations that are supposed to address this problem, I'm running plain vanilla. We both have the same motherboard and I assume that we both have #5 covered.  What video card do you have and what drivers are you using?  If it's a Nvidia card and you're using the opensource drivers, my only suggestion right now is to switch to the proprietary drivers and start the clock.

                                                 

                                                It could be that the crash at idle could be more prevalent with more cores.  That's not something I can test until the new 2700X's come out.  I did order the memory yesterday and will likely install sometime during the week or next weekend.  I'll keep you posted.

                                                • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                  john1000

                                                  Update:  I let the machine run 9 days without a restart.  I ran into no issues.  I installed 64GB of ram and ran a memtest for 27 hours with 4 complete passes - no memory errors.  I've been using it ever since.  If I run into any trouble, I'll post here.  I'm likely many weeks away from purchasing a 2700X - I'd like to wait for the official reviews and any BIOS updates first.

                                                    • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                      imshalla

                                                      Happy to hear your (effectively) new machine is running OK -- without any of the "Kernel Magic".

                                                       

                                                      I have AMD Firepro W2100.  There are rumours of issues with amdgpu, but mostly discounted in favour of C6 issues.

                                                       

                                                      AMD advice notwithstanding, changing to a 0V at 12V PSU did not fix the issue for me.

                                                       

                                                      Using zenstates.py, I have been running with C6 disabled for the "package" for about 14 days now, without a freeze.

                                                       

                                                      So far, ASUS have not shipped the "Power Supply Idle Control" (at least for the Prime X370-PRO).  I asked on 26-Mar, and on 4-Apr "servicecenter_emea@asus.com" were able to tell me:

                                                       

                                                           No anoucement in regards to this has been made, as of yet

                                                            So we are not able to advise if/when this wil lbe coming

                                                       

                                                      <sigh>

                                                       

                                                      It is rumoured that the "Power Supply Idle Control" options have the effect of disabling C6 either entirely or for the "package", which is what the zenstates.py allows one to do.  However, it is also suggested that tweaking some Overclocking options can also eliminate the freezes.  It may be that somebody clever at AMD has figured out a better way of fixing the problem.

                                                       

                                                      Amongst other things, I would dearly like to know:

                                                       

                                                      • what, exactly, do the "Power Supply Idle Control" options do ?
                                                      • is it true that disabling C6 altogether will disable the "Max Boost Clock" ?
                                                      • does disabling C6 for the "package" allow "Max Boost Clock" to be used ?
                                                      • what difference do these options make to power consumption ?

                                                       

                                                      I have been hoping for more information from AMD.  I last wrote to AMD "TECH.SUPPORT" on 14-Mar... so far, no reply.

                                                       

                                                      Before sending my original Ryzen away to go through the (two week) RMA process, I bought myself an i7 8700K.  Sadly, only 6 cores... but it does at least work.  Also, it has a built in GPU which does as much as I need (I am neither a gamer nor a Bitcoin miner).  I look forward to the 10nm 8 core version.

                                                        • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                          billy72

                                                          New BIOS for the ASUS Prime X370-Pro v4008 2018/04/17 has been added "Power Supply Idle Control" option in Advanced/AMD CBS, it includes a new Microcode Update Revision 8001137 and update to AGESA 1002a

                                                          • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                            john1000

                                                            Last week, I had two failures during idle, neither left any hints in the logs.  After the first, I replaced the power supply with a Seasonic Prime Titanium 650 and the very next day, it happened again.  I completely agree with imshalla that AMD's explanation that this is related to the power supply not being "haswell" compatible is false.  I updated the BIOS of my Asus Prime X370 to v4008, installed a 2700x yesterday and set the Power Supply Idle Control to get around this issue.  Tonight, I'm going to set the Power Supply Idle control back to default to see if there is an impact to power consumption and frequency range the processor runs at.  I moved one of my VMs to it last night, (elastic search/logstash/kibana stack) and I wonder if that is enough load to prevent this from happening in the future.  Will update this once I answer the power consumption/frequency range question.

                                                    • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                      aslon

                                                      Could an AMD representative confirm if the 2000 series Ryzens are affected by the random soft lockup bug?

                                                       

                                                      196683 – Random Soft Lockup on new Ryzen build

                                                      • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                        billy72

                                                        Hi imshalla and Ryzen-Linux users

                                                         

                                                        I know that my question does not have to do with the reason of your thread but your answer could help me:

                                                         

                                                        I have a MB ASUS Prime X370-Pro with a Ryzen 5 1600 CPU, I use Fedora 27 and always after GRUB I get the same kernel bugs when I start any Linux distro includes in Live mode:

                                                         

                                                        - ACPI Error: Needed [Integer/String/Buffer], found [Region] 00000000b97f0858 (20170831/exresop-424)

                                                         

                                                        -ACPI Exception: AE_AML_OPERAND_TYPE, Could not execute arguments for [IOB2] (Region) (20170831/nsinit-426)

                                                         

                                                        -sp5100_tco: I/O address 0x0cd6 already in use

                                                         

                                                         

                                                        According to the comments of Laura Abbott from bugzilla, mentions the possibility that it is a BIOS bug and not a kernel bug:

                                                         

                                                        Does the same thing happen to you when you start Fedora or any Linux distribution?

                                                         

                                                        Thanks for share

                                                          • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                            imshalla

                                                            Sorry for the long delay... I have almost stopped caring whether my AMD machine will ever work properly...

                                                             

                                                            ...but today I upgraded BIOS to 4011 on my ASUS Prime X370-Pro.  The machine was frozen and had first to be rebooted, which gave:

                                                             

                                                            May  4 12:09:46 fubar kernel: Linux version 4.15.14-300.fc27.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Thu Mar 29 16:13:44 UTC 2018

                                                             

                                                            May  4 12:09:46 fubar kernel: DMI: System manufacturer System Product Name/PRIME X370-PRO, BIOS 3805 03/06/2018

                                                             

                                                            May  4 12:09:46 fubar kernel: ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20170831/tbfadt-658)

                                                             

                                                            May  4 12:09:46 fubar kernel: ACPI Error: Needed [Integer/String/Buffer], found [Region] 00000000b901a525 (20170831/exresop-424)

                                                            May  4 12:09:46 fubar kernel: ACPI Exception: AE_AML_OPERAND_TYPE, Could not execute arguments for [IOB2] (Region) (20170831/nsinit-426)

                                                            May  4 12:09:46 fubar kernel: ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored

                                                             

                                                            And then after updating everything:

                                                             

                                                            May  4 13:33:46 fubar kernel: Linux version 4.16.5-200.fc27.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Fri Apr 27 19:05:44 UTC 2018

                                                             

                                                            May  4 13:33:46 fubar kernel: DMI: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018

                                                             

                                                            May  4 13:33:46 fubar kernel: ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20180105/tbfadt-658)

                                                             

                                                            May  4 13:33:46 fubar kernel: ACPI Error: Needed [Integer/String/Buffer], found [Region]         (ptrval) (20180105/exresop-424)

                                                            May  4 13:33:46 fubar kernel: ACPI Error: AE_AML_OPERAND_TYPE, Could not execute arguments for [IOB2] (Region) (20180105/nsinit-426)

                                                            May  4 13:33:46 fubar kernel: ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored

                                                             

                                                            ...so, yes, I get the same messages.

                                                             

                                                            ...but no, I have no idea what this means.

                                                             

                                                            ...and, frankly, I fear that the best thing is to forget about it -- the probability of persuading AMD or ASUS to document/explain is (IMHO) effectively zero.

                                                          • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                            imshalla

                                                            I lost interest in my Ryzen 7 1800X machine... but this morning I got round to upgrading the BIOS on the ASUS Prime X370-Pro (to 4011, hot off the press: "Update AGESA 1.0.0.2a + SMU 43.18" whatever that means).

                                                             

                                                            I was running with rcu_nocbs and zenstates --c6-package-disable. That ran for 11 days and then froze.

                                                             

                                                            I last heard from "TECH.SUPPORT@AMD.COM":

                                                             

                                                              Thank you for the update and confirming that your BeQuite Straight Power 11

                                                              supports 0A minimum load.

                                                             

                                                              Because your system still freezes, it could be due to cross loading problems

                                                              which can result in the power supply turning off when a load changes or

                                                              result in voltages becoming out of specification causing system crashes and

                                                              hangs. entire CCX or Core complex is taken down.

                                                             

                                                              There are many levels of power states that a core can be in from C1 to C6,

                                                              CC6 and finally PC6. The Power Supply Idle Control option is designed to

                                                              keep enough current on the rail so that power supply does not go out of

                                                              regulation.

                                                             

                                                              The Power Supply Idle Control option is part of an AGESA update from AMD

                                                              provided to the motherboard vendors for validation and implementation in

                                                              their BIOS updates. However, it is motherboard vendors decision as this

                                                              which BIOS version will contain the Power Supply Idle Control option.

                                                             

                                                            So now I have options: "low current idle", "typical current idle" and "auto". Neither AMD nor ASUS seem to think it necessary to document what those mean.

                                                             

                                                            I have set "typical current idle". I note that zenstates shows that both C6 States Package and Core are Enabled.

                                                             

                                                            I guess I am back to waiting and seeing.

                                                             

                                                            <sigh>

                                                              • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                billy72

                                                                Hi imshalla

                                                                 

                                                                Thank you very much for the tests

                                                                 

                                                                I fully understand your discouragement, the owners of the Asus Prime X370-Pro have to hold up to date 21 BIOS updates since its launch, and this is not over yet, we are 'forced to update' viewers to bug resolutions and for Security reasons, it's crazy....and all this without information from the manufacturer

                                                                 

                                                                The first year of Ryzen for Linux users is simply to forget, for the moment the only advantage has been the price, and even so it does not seem so cheap for the clueless like me that we bought a CPU launched to the market knowing that they would be a source of bugs.

                                                                 

                                                                ... so we'll keep waiting to see what happens

                                                                 

                                                                Best regards

                                                                • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                  lf42

                                                                  Hello Everyone,

                                                                   

                                                                  I was having random lockups with Linux also but I have appeared to fix it.

                                                                  CPU: AMD Ryzen 7 1700

                                                                  Motherboard: ASUS Prime X370-PRO

                                                                   

                                                                  With the 4008 BIOS update I made the following change in BIOS: Advanced Mode -> Advanced -> AMD CBS -> Power Supply Idle Control: Typical Current Idle

                                                                   

                                                                  After that the computer has stayed running for about a month.

                                                                  1 of 1 people found this helpful
                                                                  • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                    gburgwardt

                                                                    Hi imshalla,

                                                                     

                                                                    I'm also suffering from this bug. I've heard on another bug tracker (I don't have a link, sorry) that overclocking some can help prevent the crashing.

                                                                     

                                                                    I've overclocked 200mhz and changed to typical current idle, as well as disabled both c6 states (package and core - I've heard that package only isn't enough).

                                                                     

                                                                    I'm optimistic that this all will fix it, have you had any more crashes with typical current idle?    

                                                                      • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                        imshalla

                                                                        So far, uptime 11 days with "typical current idle".   So far, so good.

                                                                          • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                            imshalla

                                                                            Today 30 days uptime with the magic "typical" BIOS setting. This is more than twice the previous record.

                                                                             

                                                                            FWIW: I am so fed up with this machine that I haven't used it since I updated the BIOS and applied the setting. It is running 4.16.5 (Fedora 27), with CONFIG_RCU_NOCB_CPU and rcu_nocbs=0-15. I don't know if the rcu_nocbs=0-15 is still required.

                                                                             

                                                                            Also FWIW: zenstates.py -l tells me that C6 Package is Disabled, but C6 Core is Enabled. Before the BIOS update I used zenstates.py to set C6 the same way, but the machine froze after some 12 days. After the BIOS update I no longer use zenstates.py to set anything. So I guess the BIOS "typical" option disables C6 Package, but also does some other magic.

                                                                             

                                                                            Mr BeQuiet! are adamant that the Straight Power 11 I have is perfectly happy to supply 0A at all voltages.

                                                                             

                                                                            Of course, there's a lot of stuff between the PSU and the CPU... so it could be a motherboard issue. Who can tell ?

                                                                             

                                                                            Possibly, some day, I will go back to using by AMD machine, but I doubt I shall come to be fond of it :-( Certainly I am livid with AMD's abject failure to address the issue promptly, and their continuing inability to discuss or document the issue. Bugs happen. It's how they are dealt with that separates the sheep from the goats. <sigh>

                                                                          • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                            tengisu

                                                                            "Typical Current Idle" setting seems to have fixed it for me. Only changed this setting in UEFI, everything else default. Uptime 4 days without an incident, running the latest bios (4012) and kernel 4.18.0-rc7-mainline on asus prime x370-pro with 1700. Had to put up with random freezes ever since I built this system. Upgraded ram to faster ones thinking they could be the culprit. Been updating to the latest bios as soon as they came out and running the latest kernel hoping something would fix it. Finally my system will be stable, crossing my fingers.

                                                                        • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                          spiffy

                                                                          Is there any info about if this affects Ryzen 2000 series?

                                                                           

                                                                          I'm hoping that the Ryzen 5 1600 I bought can be made stable using the ZenStates /  Power Supply Idle Control before I shell out on a bunch of 2600s.

                                                                           

                                                                          It's been locking up about twice a week when idle for the last 5-6 weeks but the user's only just reported it. User can turn to talk to a colleague and when he turns back the PC has locked up.

                                                                          Thanks to the useful info on this thread it's currently on day 2 of test with "zenstates.py --c6-disable" so I'll not know for a few more days yet.

                                                                           

                                                                          As can be seen from the below spec it should be a fairly low power draw on the PSU when idle

                                                                          Ryzen 5 1600 (stock, no overclock)

                                                                          Gigabyte A320MA-M.2 (BIOS: F23d 2018-04-17 (latest))

                                                                          16GB Corsair Vengeance 2400

                                                                          250 GB SSD

                                                                          Nvidia GT 710

                                                                          Corsair CX550 PSU (Supposedly Haswell C6/C7 compliant)

                                                                          Linux 18.3 Mint Mate/ Kernel 4.13.0-43-generic (latest)

                                                                           

                                                                          Edit: Typos

                                                                          • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                            imbezol

                                                                            I have been having these exact same issues since I got my system. Typically I come to the computer in the morning and it'll be locked up and I have to reset it. On occasion though, and when the system is fairly idle, it will freeze in the middle of using it. Mouse pointer will freeze on the screen.. system doesn't respond to pings any longer.. and I have to reset it. This happens every couple days usually but have had it happen 3 times in a day before. I have been through every BIOS version, custom kernels, and swapped almost every piece of hardware, to no avail.

                                                                             

                                                                            Asus PRIME X370-PRO

                                                                            AMD Ryzen 1700

                                                                            Corsair Vengeance LED DDR4 3200Mhz 2 x 8GB (x2 for a total of 32 GB)

                                                                            MSI Radeon RX 580 Gaming X 8G

                                                                            -> upgraded from Asus R9 280x in attempt to fix lockups

                                                                            Corsair HX1000i PSU

                                                                            -> upgraded from Corsair TX750W in attempt to fix lockups

                                                                            Samsung 960 PRO NVMe M.2 PCIe x4 SSD 1TB

                                                                            -> got rid of all spinning disks in attempt to fix lockups

                                                                             

                                                                            Also swapped keyboard, mouse, and monitors, got rid of USB devices, etc.

                                                                             

                                                                            I started with Ubuntu 16.04 and then moved to Arch and now back to Ubuntu and am on 18.04 currently.

                                                                              • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                gburgwardt

                                                                                Did you try the bios setting everyone has been taking about, or disable the

                                                                                c6 power state?

                                                                                • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                  imshalla

                                                                                  Today my Ryzen 7 1800X, on Asus Prime X370-Pro, has 67 days uptime, and has been idle for all that time.

                                                                                   

                                                                                  So, I have not suffered a lockup since upgrading to BIOS to 4011 (with "AGESA 1.0.0.2a + SMU 43.18"), and setting the "Advanced Mode -> Advanced -> AMD CBS -> Power Supply Idle Control" option to "Typical Current Idle".

                                                                                  • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                    imbezol

                                                                                    I did run the BIOS 4011 for a while and still had crashes. I am currently running 4012 and came to a crashed system this morning.

                                                                                     

                                                                                    I just set the "Power Supply Idle Control" option earlier this morning after seeing this thread so I will update in a couple days if it has helped.

                                                                                    • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                      imbezol

                                                                                      Prior to changing that setting....

                                                                                       

                                                                                      > last | grep boot

                                                                                      reboot   system boot  4.15.0-23-generi Mon Jul 16 08:47   still running

                                                                                      reboot   system boot  4.15.0-23-generi Mon Jul 16 05:21 - 08:44  (03:22)

                                                                                      reboot   system boot  4.15.0-23-generi Sat Jul 14 12:51 - 08:44 (1+19:52)

                                                                                      reboot   system boot  4.15.0-23-generi Sat Jul 14 12:08 - 12:32  (00:23)

                                                                                      reboot   system boot  4.15.0-23-generi Fri Jul 13 07:33 - 10:49 (1+03:16)

                                                                                      reboot   system boot  4.15.0-23-generi Wed Jul 11 12:32 - 10:49 (2+22:16)

                                                                                       

                                                                                       

                                                                                      And now...

                                                                                      09:46:13 up 4 days, 59 min,  2 users,  load average: 0.32, 0.50, 0.55

                                                                                       

                                                                                       

                                                                                      It's looking promising.

                                                                                      • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                        imbezol

                                                                                        07:13:53 up 11 days, 22:27,  2 users,  load average: 0.73, 0.81, 0.77

                                                                                         

                                                                                        Still no crashes! I think the "Power Supply Idle Control" may have solved the issue.

                                                                                          • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                            billy72

                                                                                            I recently upgraded to Ryzen 7 2700 on ASUS Prime X370-Pro BIOS v4012 with all default settings includes [Power Supply Idle Control → Auto] and Fedora 28 kernel 4.17.6 and up: no random soft lockups, no crashes.

                                                                                             

                                                                                            On Ryzen 5 1600 BIOS v4012 [Power Supply Idle Control → Typical Current Idle] neither did crash happen, but it is very possible that the latest versions of kernel 4.17.x have helped to solved the problem.

                                                                                             

                                                                                            … it seems that Linux-Zen evolves in the right direction

                                                                                             

                                                                                             

                                                                                             

                                                                                            Edit Jul 30 2018: the comment that I wrote that says "but is very possible that the last versions of kernel 4.17.x have helped to solved the problem." is a personal opinion based on my experience regarding the improvement of the stability of different Linux Desktop OS with different kernels and BIOS versions installed in my Ryzen-ASUS system for 14 months, I do not have enough information or knowledge to ensure or launch a hypothesis of how the Linux kernel has helped solve the problems mentioned in this thread. I want to  apologize to shinobi and imshalla for possible confusions (and other kind of negative emotions) that may have caused my words, my enthusiasm sometimes plays tricks on me.

                                                                                              • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                                shinobi

                                                                                                billy72, would you have the patience to do some experiments to narrow down to the exact kernel version in 4.17.x . ???

                                                                                                I mean, to have a test environment where, there are the least number of variables,

                                                                                                and just bumping to the next .x kernel version, solves the issue for you ?

                                                                                                & going back the previous version reproduces the problem ?

                                                                                                It would be really interesting

                                                                                                And, not to mention, the BIOS kept at all defaults.

                                                                                                  • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                                    billy72

                                                                                                    Hi shinobi

                                                                                                     

                                                                                                    I have the patience to do the experiments that you request

                                                                                                     

                                                                                                    Next week i will receive a new ASRock B350 motherboard for a new built with Ryzen 5 1600 1713SUS and Urano NOX SX 500W PSU, i thing could be interesting test some kernels in this scenario because it's  the 1st gen of Ryzen  with a reliable low-end PSU ... if AMD and Motherboards Manufacturers (via BIOS) or the latest versions of Kernel have possibly solved the problem freezes would not have to happen on 1st Gen Ryzen with a reliable low/mid/high-end PSU and with any motherboard brand.

                                                                                                     

                                                                                                    could be interesting for you (and all those affected) the kernels test on ASRock AB350M-HDV or necessarily on ASUS Prime X370-Pro?

                                                                                                  • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                                    imshalla

                                                                                                    As noted elsewhere, I have a Ryzen 7 1800X on an ASUS Prime X-370 Pro.  The Ryzen was originally 1708SUT RMA'd to 1737SUS.

                                                                                                     

                                                                                                    Almost as soon as it was available, I upgraded to BIOS v4011(Update AGESA 1.0.0.2a + SMU 43.18) and:

                                                                                                    1. turned on the "Typical Current Idle" option.
                                                                                                    2. stopped using zenstates.py -- which I had been using to enable "C6 Core" but disable "C6 Package" (to no avail).
                                                                                                    3. did *not* change Linux -- which was 4.16.5 -- Fedora 27.
                                                                                                    4. continued to use CONFIG_RCU_NOCB_CPU and rcu_nocbs=0-15

                                                                                                    After 67 days uptime (leaving the system completely idle and changing nothing), I am pretty convinced that the "Typical Current Idle" option has dealt with the "freezing when idle" problem.

                                                                                                     

                                                                                                    When I say "freezing when idle", what I mean is: if the machine is left idle (typically over night) it simply stops responding.  Nothing at all is logged -- no application, driver or kernel errors or warnings are logged -- the machine is still powered up, but frozen solid.  The only way to restart the machine is to power down and up again.

                                                                                                     

                                                                                                    The symptoms for the so-called "Random Soft Lockup" include log messages of the form:

                                                                                                     

                                                                                                        NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s!

                                                                                                     

                                                                                                    Long ago, I did (once or twice) see "NMI watchdog" messages, followed after a while by "freezing when idle".

                                                                                                     

                                                                                                    The Kernel Bugzilla <https://bugzilla.kernel.org/show_bug.cgi?id=196683> is nominally about "Random Soft Lockup" but also covers what I call "freezing when idle".  The discussion there includes the advice to set CONFIG_RCU_NOCB_CPU and use rcu_nocbs=0-15.  My recollection is that after I did that, the "NMI watchdog" messages went away, but the "freeze when idle" problem remained.

                                                                                                     

                                                                                                    I do not know if the "NMI watchdog" messages and the "freezing when idle" problem are related or completely separate.

                                                                                                     

                                                                                                    I do know that until I set "Typical Current Idle", I was experiencing "freezing when idle" (without "NMI watchdog" messages) .

                                                                                                     

                                                                                                    FWIW, if AMD have documented what the "Typical Current Idle" BIOS option actually does, I have missed the memo.  I'm hoping that it doesn't have a significant negative impact on power consumption when idle.

                                                                                                     

                                                                                                    AFAICS, the Kernel guys have no clue what the "Typical Current Idle" option does either, so I have no reason to suppose that anything in the 4.17.x Kernel has anything to do with the "freezing when idle".

                                                                                                     

                                                                                                    Still, mustn't grumble: it is only 16 months and 1 RMA since I bought it, and my Ryzen machine now appears to work.  Hurrah !

                                                                                                      • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                                        billy72


                                                                                                        This could be useful:


                                                                                                        - Have you installed and tested the BIOS v4012?

                                                                                                        - Have you installed and tested different kernel versions with all the default values in OS and BIOS v4012?

                                                                                                        - Freezing or states that you call "freezing when idle":
                                                                                                        Are they still happening with the default settings in OS and BIOS v4012?

                                                                                                          

                                                                                                        Feel free to not answer any of this and ignore future ticket notices in this thread


                                                                                                         

                                                                                                        thanks & enjoy

                                                                                                          • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                                            imshalla

                                                                                                            I have not yet installed BIOS v4012.  I note that it claims to "Improve system performance".  [I suppose that's true of most new releases ?]

                                                                                                             

                                                                                                            Before the "Typical Current Idle" option became available, I suffered "freezing when idle" on various Kernels -- updating Fedora 27 pretty much daily.  Using zenstates.py to enable "C6 Core" but disable "C6 Package" may have reduced the frequency of freezes, but did not eliminate them.

                                                                                                             

                                                                                                            After the reported nearly 10 weeks of no "freezing when idle" I went back to actually using the machine, and have updated to Fedora 28 and am currently running 4.17.7-200.fc28.x86_64.

                                                                                                             

                                                                                                            I confess I have not tried turning off "Typical Current Idle".  Nor have I changed the CONFIG_RCU_NOCB_CPU and rcu_nocbs=0-15 -- I believe CONFIG_RCU_NOCB_CPU is the default for Fedora, but rcu_nocbs=0-15 makes some difference (but I also confess that this is, essentially, Voodoo).

                                                                                                             

                                                                                                            Before upgrading to BIOS v4012 I will try turning off "Typical Current Idle".  If I get a freeze, that will reinforce the feeling that AMD have addressed the issue.  I could then turn it back on and turn off the rcu_nocbs=0-15 voodoo.  However, the freezes are/were intermittent and unpredictable, so it is hard to say how long such a test will take.  [I gather that for some people the freezes occur quite frequently.  My machine rejects a steady stream of ssh login attempts, which may stop it going into the deepest of sleep states much of the time, and that may have an effect -- who can tell ?]

                                                                                                             

                                                                                                            I guess the good people at AMD will have done some testing themselves and I assume they have a deeper understanding of the issue than I do.  AMD did recommend the "Typical Current Idle" option to me.  (AMD also said that a PSU which supports 0A at 12V is a requirement.)

                                                                                                        • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                                          imbezol

                                                                                                          I did not change kernels.... with crashes and without crashes I used 4.15.0-23-generic/x86_64. The only change I made was the BIOS setting "Power Supply Idle Control" and the issue now appears to be resolved.

                                                                                                            • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                                              imbezol

                                                                                                              24 days and no crash now. Longest my system has ever been up.

                                                                                                                • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                                                  shinobi

                                                                                                                  Let  me also add to the body of knowledge present in this thread.

                                                                                                                  For me also, setting "Typical current idle" appears to seemingly work.(no lockup on idle)

                                                                                                                  Mobo : Prime Pro X370 with BIOS 4012

                                                                                                                  CPU : 1700X

                                                                                                                  OS = Unraid 6.5.3 (running on USB with the Array stopped)

                                                                                                                  # uname -a

                                                                                                                  Linux Tower 4.14.49-unRAID #1 SMP Mon Jun 11 16:21:07 PDT 2018 x86_64 AMD Ryzen 7 1700X Eight-Core Processor AuthenticAMD GNU/Linux

                                                                                                                   

                                                                                                                  Some extra info if means anything:

                                                                                                                  # cpufreq-info -w

                                                                                                                  2200000

                                                                                                                  After loading the cores to near 100% with

                                                                                                                  # for i in {1..16}; do while : ; do : ; done & done

                                                                                                                  # cpufreq-info -w

                                                                                                                  3400000

                                                                                                                  # kill $(jobs -p)

                                                                                                                  Above behavior is seen in both cases (Typical current idle or Disable Global C State)

                                                                                                                   

                                                                                                                  Power consumption wise, based on the digital display on the UPS, I find that when the system is idle,

                                                                                                                  with BIOS = All defaults + Typical Current idle, the system consumes 8 to 10 Watts less!!, when compared to BIOS=All defaults + Disable Global C state.

                                                                                                                  This measurements may not be super accurate, as other devices are also connected to the UPS, so the observation is only "rough" for the time being.

                                                                                                        • Re: Ryzen linux kernel bug 196683 - Random Soft Lockup
                                                                                                          shinobi

                                                                                                          @jesse_amd jesse_amd, Would you help us folks, who have been suffering from the idle lockup bug, with Ryzen for so so so long ???

                                                                                                          We are asking you to comment because, we find that you seem to have helped the customers with EPYC CPUs, to solve a similar problem.

                                                                                                          The bugzilla ticket URL is 196683 – Random Soft Lockup on new Ryzen build

                                                                                                          With everything new & stock, we always have to go into the BIOS and disable the "Global C state control" setting in the BIOS, to make the system stable.

                                                                                                          else, usually, when the system is idle, it would lockup up. Other findings include, but not limited to setting the Power Supply Idle current to  "common current idle"

                                                                                                          We would be glad to have a BIOS firmware fix, so tha,t with that, the system would remain stable with the default BIOS settings.

                                                                                                          Even the recent BIOS updates, for instance, the one that updates the firmware to AGESA 1.0.0.2a + SMU 43.18, has not helped.

                                                                                                          Kindly help!