cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

john1000
Adept II
Adept II

Re: Ryzen linux kernel bug 196683 - Random Soft Lockup

Update:  I let the machine run 9 days without a restart.  I ran into no issues.  I installed 64GB of ram and ran a memtest for 27 hours with 4 complete passes - no memory errors.  I've been using it ever since.  If I run into any trouble, I'll post here.  I'm likely many weeks away from purchasing a 2700X - I'd like to wait for the official reviews and any BIOS updates first.

0 Kudos
imshalla
Adept II
Adept II

Re: Ryzen linux kernel bug 196683 - Random Soft Lockup

Happy to hear your (effectively) new machine is running OK -- without any of the "Kernel Magic".

I have AMD Firepro W2100.  There are rumours of issues with amdgpu, but mostly discounted in favour of C6 issues.

AMD advice notwithstanding, changing to a 0V at 12V PSU did not fix the issue for me.

Using zenstates.py, I have been running with C6 disabled for the "package" for about 14 days now, without a freeze.

So far, ASUS have not shipped the "Power Supply Idle Control" (at least for the Prime X370-PRO).  I asked on 26-Mar, and on 4-Apr "servicecenter_emea@asus.com" were able to tell me:

     No anoucement in regards to this has been made, as of yet

      So we are not able to advise if/when this wil lbe coming

<sigh>

It is rumoured that the "Power Supply Idle Control" options have the effect of disabling C6 either entirely or for the "package", which is what the zenstates.py allows one to do.  However, it is also suggested that tweaking some Overclocking options can also eliminate the freezes.  It may be that somebody clever at AMD has figured out a better way of fixing the problem.

Amongst other things, I would dearly like to know:

  • what, exactly, do the "Power Supply Idle Control" options do ?
  • is it true that disabling C6 altogether will disable the "Max Boost Clock" ?
  • does disabling C6 for the "package" allow "Max Boost Clock" to be used ?
  • what difference do these options make to power consumption ?

I have been hoping for more information from AMD.  I last wrote to AMD "TECH.SUPPORT" on 14-Mar... so far, no reply.

Before sending my original Ryzen away to go through the (two week) RMA process, I bought myself an i7 8700K.  Sadly, only 6 cores... but it does at least work.  Also, it has a built in GPU which does as much as I need (I am neither a gamer nor a Bitcoin miner).  I look forward to the 10nm 8 core version.

0 Kudos
aslon
Journeyman III
Journeyman III

Re: Ryzen linux kernel bug 196683 - Random Soft Lockup

Could an AMD representative confirm if the 2000 series Ryzens are affected by the random soft lockup bug?

196683 – Random Soft Lockup on new Ryzen build

0 Kudos
john1000
Adept II
Adept II

Re: Ryzen linux kernel bug 196683 - Random Soft Lockup

Last week, I had two failures during idle, neither left any hints in the logs.  After the first, I replaced the power supply with a Seasonic Prime Titanium 650 and the very next day, it happened again.  I completely agree with imshalla that AMD's explanation that this is related to the power supply not being "haswell" compatible is false.  I updated the BIOS of my Asus Prime X370 to v4008, installed a 2700x yesterday and set the Power Supply Idle Control to get around this issue.  Tonight, I'm going to set the Power Supply Idle control back to default to see if there is an impact to power consumption and frequency range the processor runs at.  I moved one of my VMs to it last night, (elastic search/logstash/kibana stack) and I wonder if that is enough load to prevent this from happening in the future.  Will update this once I answer the power consumption/frequency range question.

0 Kudos
imshalla
Adept II
Adept II

Re: Ryzen linux kernel bug 196683 - Random Soft Lockup

I lost interest in my Ryzen 7 1800X machine... but this morning I got round to upgrading the BIOS on the ASUS Prime X370-Pro (to 4011, hot off the press: "Update AGESA 1.0.0.2a + SMU 43.18" whatever that means).

I was running with rcu_nocbs and zenstates --c6-package-disable. That ran for 11 days and then froze.

I last heard from "TECH.SUPPORT@AMD.COM":

  Thank you for the update and confirming that your BeQuite Straight Power 11

  supports 0A minimum load.

  Because your system still freezes, it could be due to cross loading problems

  which can result in the power supply turning off when a load changes or

  result in voltages becoming out of specification causing system crashes and

  hangs. entire CCX or Core complex is taken down.

  There are many levels of power states that a core can be in from C1 to C6,

  CC6 and finally PC6. The Power Supply Idle Control option is designed to

  keep enough current on the rail so that power supply does not go out of

  regulation.

  The Power Supply Idle Control option is part of an AGESA update from AMD

  provided to the motherboard vendors for validation and implementation in

  their BIOS updates. However, it is motherboard vendors decision as this

  which BIOS version will contain the Power Supply Idle Control option.

So now I have options: "low current idle", "typical current idle" and "auto". Neither AMD nor ASUS seem to think it necessary to document what those mean.

I have set "typical current idle". I note that zenstates shows that both C6 States Package and Core are Enabled.

I guess I am back to waiting and seeing.

<sigh>

0 Kudos
lf42
Journeyman III
Journeyman III

Re: Ryzen linux kernel bug 196683 - Random Soft Lockup

Hello Everyone,

I was having random lockups with Linux also but I have appeared to fix it.

CPU: AMD Ryzen 7 1700

Motherboard: ASUS Prime X370-PRO

With the 4008 BIOS update I made the following change in BIOS: Advanced Mode -> Advanced -> AMD CBS -> Power Supply Idle Control: Typical Current Idle

After that the computer has stayed running for about a month.

gburgwardt
Journeyman III
Journeyman III

Re: Ryzen linux kernel bug 196683 - Random Soft Lockup

Hi imshalla,

I'm also suffering from this bug. I've heard on another bug tracker (I don't have a link, sorry) that overclocking some can help prevent the crashing.

I've overclocked 200mhz and changed to typical current idle, as well as disabled both c6 states (package and core - I've heard that package only isn't enough).

I'm optimistic that this all will fix it, have you had any more crashes with typical current idle?    

0 Kudos
imshalla
Adept II
Adept II

Re: Ryzen linux kernel bug 196683 - Random Soft Lockup

So far, uptime 11 days with "typical current idle".   So far, so good.

0 Kudos
spiffy
Journeyman III
Journeyman III

Re: Ryzen linux kernel bug 196683 - Random Soft Lockup

Is there any info about if this affects Ryzen 2000 series?

I'm hoping that the Ryzen 5 1600 I bought can be made stable using the ZenStates /  Power Supply Idle Control before I shell out on a bunch of 2600s.

It's been locking up about twice a week when idle for the last 5-6 weeks but the user's only just reported it. User can turn to talk to a colleague and when he turns back the PC has locked up.

Thanks to the useful info on this thread it's currently on day 2 of test with "zenstates.py --c6-disable" so I'll not know for a few more days yet.

As can be seen from the below spec it should be a fairly low power draw on the PSU when idle

Ryzen 5 1600 (stock, no overclock)

Gigabyte A320MA-M.2 (BIOS: F23d 2018-04-17 (latest))

16GB Corsair Vengeance 2400

250 GB SSD

Nvidia GT 710

Corsair CX550 PSU (Supposedly Haswell C6/C7 compliant)

Linux 18.3 Mint Mate/ Kernel 4.13.0-43-generic (latest)

Edit: Typos

0 Kudos
imshalla
Adept II
Adept II

Re: Ryzen linux kernel bug 196683 - Random Soft Lockup

Today 30 days uptime with the magic "typical" BIOS setting. This is more than twice the previous record.

FWIW: I am so fed up with this machine that I haven't used it since I updated the BIOS and applied the setting. It is running 4.16.5 (Fedora 27), with CONFIG_RCU_NOCB_CPU and rcu_nocbs=0-15. I don't know if the rcu_nocbs=0-15 is still required.

Also FWIW: zenstates.py -l tells me that C6 Package is Disabled, but C6 Core is Enabled. Before the BIOS update I used zenstates.py to set C6 the same way, but the machine froze after some 12 days. After the BIOS update I no longer use zenstates.py to set anything. So I guess the BIOS "typical" option disables C6 Package, but also does some other magic.

Mr BeQuiet! are adamant that the Straight Power 11 I have is perfectly happy to supply 0A at all voltages.

Of course, there's a lot of stuff between the PSU and the CPU... so it could be a motherboard issue. Who can tell ?

Possibly, some day, I will go back to using by AMD machine, but I doubt I shall come to be fond of it :( Certainly I am livid with AMD's abject failure to address the issue promptly, and their continuing inability to discuss or document the issue. Bugs happen. It's how they are dealt with that separates the sheep from the goats. <sigh>

0 Kudos