3 Replies Latest reply on Oct 1, 2018 1:01 PM by evanrinehart

    Ryzen 2700x Hardware errors Linux

    xblack

      Hi,

       

      I have a Ryzen 2700x on a gigabyte AURUS x470 Gaming Ultra with CMK32GX4M2B3000C15 ram on bank 1 and 2.

       

      I currently get this error from now and then with some sporadic full system freeze that require a manual reset.

      Jul 27 09:27:09 marcopc kernel: mce: [Hardware Error]: Machine check events logged
      Jul 27 09:27:09 marcopc kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 3: 9820000000000150
      Jul 27 09:27:09 marcopc kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 2a000503 IPID 300b000000000
      Jul 27 09:27:09 marcopc kernel: mce: [Hardware Error]: PROCESSOR 2:800f82 TIME 1532676424 SOCKET 0 APIC 3 microcode 8008206

      They are generated especially doing cross-compilation or using the ryzen-test utility (https://github.com/suaefar/ryzen-test)

      [  941.702984] mce: [Hardware Error]: Machine check events logged
      [  941.702988] [Hardware Error]: Corrected error, no action required.
      [  941.702994] [Hardware Error]: CPU:3 (17:8:2) MC3_STATUS[-|CE|MiscV|-|-|-|-|SyndV|-]: 0x9820000000000150
      [  941.702999] [Hardware Error]: IPID: 0x000300b000000000, Syndrome: 0x000000002a000503
      [  941.703002] [Hardware Error]: Decode Unit Extended Error Code: 0
      [  941.703004] [Hardware Error]: Decode Unit Error: uop cache tag parity error.
      [  941.703006] [Hardware Error]: cache level: RESV, tx: INSN, mem-tx: IRD
      [ 1255.717410] mce: [Hardware Error]: Machine check events logged
      [ 1255.717413] [Hardware Error]: Corrected error, no action required.
      [ 1255.717418] [Hardware Error]: CPU:5 (17:8:2) MC3_STATUS[-|CE|MiscV|-|-|-|-|SyndV|-]: 0x9820000000000150
      [ 1255.717422] [Hardware Error]: IPID: 0x000300b000000000, Syndrome: 0x000000002a000503
      [ 1255.717424] [Hardware Error]: Decode Unit Extended Error Code: 0
      [ 1255.717425] [Hardware Error]: Decode Unit Error: uop cache tag parity error.
      [ 1255.717427] [Hardware Error]: cache level: RESV, tx: INSN, mem-tx: IRD
      [ 1255.717430] mce: [Hardware Error]: Machine check events logged
      [ 1255.717430] [Hardware Error]: Corrected error, no action required.
      [ 1255.717432] [Hardware Error]: CPU:13 (17:8:2) MC3_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd820000000000150
      [ 1255.717434] [Hardware Error]: IPID: 0x000300b000000000, Syndrome: 0x000000002a000503
      [ 1255.717436] [Hardware Error]: Decode Unit Extended Error Code: 0
      [ 1255.717437] [Hardware Error]: Decode Unit Error: uop cache tag parity error.
      [ 1255.717438] [Hardware Error]: cache level: RESV, tx: INSN, mem-tx: IRD

      Most of the time it doesn't generate particular issues and correct itself as you see while sometime one of the processes get tainted or even the pc freeze completely

      The current system is ArchLinux with kernel 4.17.9-1 and cpu microcode patch_level=0x08008206.

       

      Is anyone else having this issue?

      Did a cpu change solve the issue?

       

      I asked AMD but as result after my first email with the information above I got an RMA approval with no other info.

       

      Thanks

       

      Marco

        • Re: Ryzen 2700x Hardware errors Linux
          vkfu

          I see the same errors on my Ryzen 5 2600X in an ASRock X470 Taichi. Every few hours starting about 10 minutes after it boots up I see these errors on a mostly idle machine. They first appeared a couple days ago but today I had frequent system hangs along with the errors. I swapped in a Ryzen 3 1200 several hours ago and have seen no more errors. I am using Ubuntu 18.04 with a 4.15.0-30 kernel.

          • Re: Ryzen 2700x Hardware errors Linux
            blsqr

            I am also having this issue. (Ubuntu 18.04, kernel 4.15.0-34). Did you manage to find a solution?

             

            In my case, the time of these log entries does, however, not coincide with the freezes; often they are many hours apart. Is that the same in your cases?

            • Re: Ryzen 2700x Hardware errors Linux
              evanrinehart

              I just purchased ryzen 7 2700X and am having similar issues. Freeze-ups and spurious hardware errors. Specs:

               

              MSI B450 motherboard

              Linux 4.19-rc5 configured with AMD support

              Latest kernel firmware package (20180913)

               

              What seems like randomly I will get hardware errors such as

              [ 6558.679190] [Hardware Error]: Corrected error, no action required.

              [ 6558.679192] [Hardware Error]: CPU:13 (17:8:2) MC3_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd820000000000150

              [ 6558.679194] [Hardware Error]: IPID: 0x000300b000000000, Syndrome: 0x000000002a000503

              [ 6558.679196] [Hardware Error]: Decode Unit Extended Error Code: 0

              [ 6558.679197] [Hardware Error]: Decode Unit Error: uop cache tag parity error.

              [ 6558.679198] [Hardware Error]: cache level: RESV, tx: INSN, mem-tx: IRD

               

              And after one night I found it frozen and in need of a reset.

               

              Before I updated the kernel, the errors and freeze ups occurred much more frequently. I could not get through an entire kernel build without a system freeze. Now it seems to be about once a day.

               

              Latest BIOS update, no overclocking, no special power mode settings in the BIOS, yet.