PCIe port errors

Discussion in 'UDOO BOLT' started by Joerg Jungermann, Aug 2, 2020.

  1. Joerg Jungermann

    Joerg Jungermann UDOOer

    Joined:
    Oct 19, 2019
    Messages:
    9
    Likes Received:
    0
    Hello,

    I get a lot of PCIe port errors although they are corrected by the bus protocol thanks to (error correction) this makes me nervous. See logs below.

    I already disassembled the bolt, and tested with default settings from BIOS/UEFI and with no devices in NGFF/M2 slots connected just booting from network(grml) or eMMC(focal).

    The referred device is 0000:00:01.6: which is according to lspci:
    00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]

    I have installed an Ubuntu Focal on NVMe and eMMC and when network booting I have a grml booted.
    Is someone seeing this too? What might be the cause? Any hints?
    Might this be a potential support case?

    best regards

    Logs:
    Aug 02 12:46:59 bolt kernel: pcieport 0000:00:01.6: AER: Multiple Corrected error received: 0000:00:01.0
    Aug 02 12:46:59 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Aug 02 12:46:59 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000
    Aug 02 12:46:59 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP
    Aug 02 12:47:02 bolt kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0
    Aug 02 12:47:02 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Aug 02 12:47:02 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000
    Aug 02 12:47:02 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP
    Aug 02 12:47:03 bolt kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0
    Aug 02 12:47:03 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Aug 02 12:47:03 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000
    Aug 02 12:47:03 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: Multiple Corrected error received: 0000:00:01.0
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: Multiple Corrected error received: 0000:00:01.0
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000080/00006000
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: [ 7] BadDLLP
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: Multiple Corrected error received: 0000:00:01.0
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=000000c0/00006000
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP
    Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: [ 7] BadDLLP
    Aug 02 12:47:06 bolt kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0
    Aug 02 12:47:06 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Aug 02 12:47:06 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000
    Aug 02 12:47:06 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP
    Aug 02 12:47:07 bolt kernel: pcieport 0000:00:01.6: AER: Multiple Corrected error received: 0000:00:01.0
    Aug 02 12:47:07 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Aug 02 12:47:07 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000
    Aug 02 12:47:07 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP
    Aug 02 12:47:10 bolt kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0
    Aug 02 12:47:10 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Aug 02 12:47:10 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000
    Aug 02 12:47:10 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP


    Connected Devices when fully assembled:
    $ lspci
    00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex
    00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
    00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
    00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
    00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
    00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
    00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
    00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A
    00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus B
    00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
    00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
    00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0
    00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1
    00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2
    00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3
    00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4
    00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5
    00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6
    00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7
    01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
    03:00.0 Network controller: Intel Corporation Dual Band Wireless-AC 3168NGW [Stone Peak] (rev 10)
    04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
    05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev 83)
    05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio Controller
    05:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
    05:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
    05:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
    05:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor
    05:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller
    05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver
    06:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 61)
     
  2. ccs_hello

    ccs_hello UDOOer

    Joined:
    Apr 15, 2017
    Messages:
    536
    Likes Received:
    194
  3. Joerg Jungermann

    Joerg Jungermann UDOOer

    Joined:
    Oct 19, 2019
    Messages:
    9
    Likes Received:
    0
    @ccs_hello thanks for pointing me to this link, which post number you mean in specific? I do not see a direct relation, as PCIe bus 6 is not mentioned by the logs above. The logs are mentioning bus 0 with it' PCIe bridges.

    The PCIe bridges are high likely the endpoints for some of the attached hardware at the M.2 connectors, according to #12 in your mentioned thread.
    I get those messages, too, when no hardware is connected to those M.2 ports. Thats why I am confused by your answer.
     
  4. ccs_hello

    ccs_hello UDOOer

    Joined:
    Apr 15, 2017
    Messages:
    536
    Likes Received:
    194
    PCIe 1022:15d3 is Raven/Raven2 PCIe GPP Bridge [6:0]
     
  5. Joerg Jungermann

    Joerg Jungermann UDOOer

    Joined:
    Oct 19, 2019
    Messages:
    9
    Likes Received:
    0
    Oh, yes thanks, I wrote that in my initial post. Are you seeing this, too? Or do you have ideas how to fix it?
     
  6. ccs_hello

    ccs_hello UDOOer

    Joined:
    Apr 15, 2017
    Messages:
    536
    Likes Received:
    194
    No, I do not have the PCIe bus error issues.
    As you can see in my earlier picture, that 6:0 has all 3 M.2 slots' PCIe buses as well as Realtel Ethernet controller.

    BTW, do you see the same issues if using a different OS?
     
  7. Joerg Jungermann

    Joerg Jungermann UDOOer

    Joined:
    Oct 19, 2019
    Messages:
    9
    Likes Received:
    0
    With Windows I cannot verify, I do not see similiar stuff in the event log, but I see those to with a network booted grml (grml.org).
     
  8. ccs_hello

    ccs_hello UDOOer

    Joined:
    Apr 15, 2017
    Messages:
    536
    Likes Received:
    194
    Windows is in EVent Manager.
    Perhaps you can try unbuntu LiveUSB instead (and tail -f on dmesg.)

    My point is the CPU/APU SoC support on AMD Ryzen may not be current on certain OS. (May not be H/W issues.)
     
  9. Joerg Jungermann

    Joerg Jungermann UDOOer

    Joined:
    Oct 19, 2019
    Messages:
    9
    Likes Received:
    0
    I investigated that a bit further and disconnected all devices in M.2 slots and tested them seperatly.
    In the
    M.2 2280: NVMe Samsung Evo 970 (focal)
    M.2 2260: Trancend SATA 512MB (Windows 10)
    M.2 2240: The Intel BT+Wifi Dongle from the Kickstarter

    Current hypthesis: It is the BT+Wifi Dongle issuing these messages.
    That's why I did not see it when booted from the Live CD (grml), because then I did not use and test Bluetooth and Wifi. When unblocking it via rfkill and manually issuing BT Scans or Wifi Scans I get those messages there, too.
    The same if I install an Focal to the eMMC device or boot an Focal Live installer and enable Wifi.

    So either it is the M.2 2240 slot or the BT+Wifi M.2 Card. Unfortunatly I do not have another dongle here to test, but I will organize one. Also I like to test if the same dongle issues errors on other PCs with M.2 slot, but thats more difficult.

    I am open to any other hints or ideas.
     

Share This Page