5 Replies Latest reply on May 4, 2018 8:31 AM by warner

    PCIe Errors TR4 Linux

    goddard

      Running Kubuntu 16.04 and I am getting some PCIe errors and need help debugging. 

       

      Thanks

       

      Product Name:X399 AORUS Gaming 7 (1.0)
      BIOS Ver:FK2
      Brand:Nvidia
      Model:AMD Threadripper 1950x
      Operating System:other - Kubuntu 16.04
      Brand:other - G Skill
      Memory Part No.:F4-3200C15D-32GTZSW

       

      system log

      -------------------------

      8/23/17 9:30 PM -x399 kernel [19510.161819] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000

      8/23/17 9:30 PM -x399 kernel [19510.161833] pcieport 0000:00:01.1: AER: Corrected error received: id=0000

      8/23/17 9:30 PM -x399 kernel [19510.161837] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)

      8/23/17 9:30 PM -x399 kernel [19510.161840] pcieport 0000:00:01.1: device [1022:1453] error status/mask=00000040/00006000

      8/23/17 9:30 PM -x399 kernel [19510.161842] pcieport 0000:00:01.1: [ 6] Bad TLP 

      8/23/17 9:31 PM -x399 kernel [19539.323943] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000

      8/23/17 9:31 PM -x399 kernel [19539.323957] pcieport 0000:00:01.1: AER: Corrected error received: id=0000

      8/23/17 9:31 PM -x399 kernel [19539.323961] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)

      8/23/17 9:31 PM -x399 kernel [19539.323964] pcieport 0000:00:01.1: device [1022:1453] error status/mask=00000040/00006000

      8/23/17 9:31 PM -x399 kernel [19539.323967] pcieport 0000:00:01.1: [ 6] Bad TLP 

      8/23/17 9:42 PM -x399 kernel [20194.657679] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000

      8/23/17 9:42 PM -x399 kernel [20194.657692] pcieport 0000:00:01.1: AER: Corrected error received: id=0000

      8/23/17 9:42 PM -x399 kernel [20194.657696] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)

      8/23/17 9:42 PM -x399 kernel [20194.657699] pcieport 0000:00:01.1: device [1022:1453] error status/mask=00000040/00006000

      8/23/17 9:42 PM -x399 kernel [20194.657702] pcieport 0000:00:01.1: [ 6] Bad TLP

       

      lspci output

      -------------------------

      00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1450

      00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 1451

      00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453

      00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453

      00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454

      00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454

      00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59)

      00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)

      00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1460

      00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1461

      00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1462

      00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1463

      00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1464

      00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1465

      00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1466

      00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1467

      00:19.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1460

      00:19.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1461

      00:19.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1462

      00:19.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1463

      00:19.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1464

      00:19.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1465

      00:19.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1466

      00:19.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1467

      01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ba (rev 02)

      01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43b6 (rev 02)

      01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b1 (rev 02)

      02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)

      02:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)

      02:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)

      02:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)

      03:00.0 USB controller: ASMedia Technology Inc. Device 1343

      04:00.0 Network controller: Intel Corporation Device 24fd (rev 78)

      05:00.0 Ethernet controller: Qualcomm Atheros Device e0b1 (rev 10)

      07:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a804

      08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a

      08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 1456

      08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 145c

      09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 1455

      09:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)

      09:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Device 1457

      40:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1450

      40:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 1451

      40:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      40:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      40:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      40:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453

      40:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      40:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      40:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454

      40:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452

      40:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454

      41:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)

      41:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)

      42:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a

      42:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 1456

      42:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 145c

      43:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 1455

      43:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)

        • Re: PCIe Errors TR4 Linux
          warner

          Same problem here, with vanilla Ubuntu 17.10 (kernel 4.13) on a Threadripper 1950x, Gigabyte Aorus with Bios F3

           

          It seems that many Linux users have this problem on Threadripper.

           

          Logs are being flooded with:

          gen 12 11:41:46 TR-Ubuntu kernel: pcieport 0000:00:01.1: AER: Corrected error received: id=0000

          gen 12 11:41:46 TR-Ubuntu kernel: pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Transmitter ID)

          gen 12 11:41:46 TR-Ubuntu kernel: pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00001000/00006000

          gen 12 11:41:46 TR-Ubuntu kernel: pcieport 0000:00:01.1:    [12] Replay Timer Timeout

          gen 12 11:41:46 TR-Ubuntu kernel: pcieport 0000:00:01.1: AER: Corrected error received: id=0000

          gen 12 11:41:46 TR-Ubuntu kernel: pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Transmitter ID)

          gen 12 11:41:46 TR-Ubuntu kernel: pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00001000/00006000

          gen 12 11:41:46 TR-Ubuntu kernel: pcieport 0000:00:01.1:    [12] Replay Timer Timeout

           

           

          On <link> I found some suggestions that adding pcie_aspm=off to Grub makes most issues go away (at the expense of higher power consumption, which is not great).

           

          However, when I try this solution, I get a blank screen at boot (but the logs are no longer flooded).

           

          Someone, anyone, has any idea what to try next?

            • Re: PCIe Errors TR4 Linux
              usererror

              Same issue, only on Ryzen R5 1600.

               

              I have been looking for an answer now for far too long.  A key part of this is msinfo32 in Windows was also reporting a PCIe Bus conflict, which makes me think this is not an issue we can solve with a software change.

               

              I also had huge issues with software crashing in Windows and Linux when both 8x PCIe slots were occupied.

               

              I have swapped motherboards, CPUs, tried several GPUs (all Nvidia), and run ram tests and the ram was on the compatibility list for the first motherboard.

            • Re: PCIe Errors TR4 Linux
              usererror

              Any changes?  I'm still having this issue and so far nobody seems to have a fix, besides buying an Intel CPU and matching motherboard of course.

                • Re: PCIe Errors TR4 Linux
                  warner

                  Ok, so the original issue was solved by adding "pcie_aspm=off" to grub.. but now I updated to ubuntu 18.04 and the problem is back! Somehow it's not only existing on my new kernell but also the 4.13 kernel from 17.10 is affected.

                   

                  (by the way, the blank screen at bootup in ubuntu 17.10 was caused by wayland not cooperating with my nVidia card. Going back to X solved that)

                   

                  Does anyone have any ideas what to try?

                    • Re: PCIe Errors TR4 Linux
                      warner

                      Some more information: switching back from PCIe 3.0 to 2.0 in BIOS resolves the issue (also without the "pcie_aspm=off" line in grub). There's a performance impact thought - and things are not working according to specifications - so it would be great if AMD would look at this and solved it with a BIOS or kernel update!