Message ID | 20220905065622.1573811-1-kai.heng.feng@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | thunderbolt: Resume PCIe bridges after switch is found on AMD USB4 controller | expand |
Hi, On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote: > AMD USB4 can not detect external PCIe devices like external NVMe when > it's hotplugged, because card/link are not up: > > pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101 I think the correct solution is then to block them from runtime suspending entirely. > Use `lspci` to resume pciehp bridges can find external devices. > > A long delay before checking card/link presence doesn't help, either. > The only way to make the hotplug work is to enable pciehp interrupt and > check card presence after the TB switch is added. > > Since the topology of USB4 and its PCIe bridges are siblings, hardcode > the bridge ID so TBT driver can wake them up to check presence. Let's not add PCI things into TBT driver unless absolutely necessary. At least on Intel hardware the PCIe hotplug is signaled by SCI when the root port is in D3, I wonder if AMD has something similar.
Hi Mika, On Mon, Sep 5, 2022 at 3:06 PM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > Hi, > > On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote: > > AMD USB4 can not detect external PCIe devices like external NVMe when > > it's hotplugged, because card/link are not up: > > > > pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101 > > I think the correct solution is then to block them from runtime > suspending entirely. Do you mean disable runtime suspend completely? Or just block runtime suspend for a period? > > > Use `lspci` to resume pciehp bridges can find external devices. > > > > A long delay before checking card/link presence doesn't help, either. > > The only way to make the hotplug work is to enable pciehp interrupt and > > check card presence after the TB switch is added. > > > > Since the topology of USB4 and its PCIe bridges are siblings, hardcode > > the bridge ID so TBT driver can wake them up to check presence. > > Let's not add PCI things into TBT driver unless absolutely necessary. OK. It's getting harder as different components are intertwined together on new hardwares... > > At least on Intel hardware the PCIe hotplug is signaled by SCI when the > root port is in D3, I wonder if AMD has something similar. Yes those root ports are resumed to D0 when something is plugged. They however fail to detect any externel PCIe devices. Kai-Heng
On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote: > AMD USB4 can not detect external PCIe devices like external NVMe when > it's hotplugged, because card/link are not up: > > pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101 That sounds like a hardware bug, how does this work in other operating systems for this hardware? > Use `lspci` to resume pciehp bridges can find external devices. That's not good :( > A long delay before checking card/link presence doesn't help, either. > The only way to make the hotplug work is to enable pciehp interrupt and > check card presence after the TB switch is added. > > Since the topology of USB4 and its PCIe bridges are siblings, hardcode > the bridge ID so TBT driver can wake them up to check presence. As I mention below, this is not an acceptable solution. AMD developers, any ideas on how to get this fixed in the TB controller firware instead? > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216448 > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > --- > drivers/thunderbolt/nhi.c | 29 +++++++++++++++++++++++++++++ > drivers/thunderbolt/switch.c | 6 ++++++ > drivers/thunderbolt/tb.c | 1 + > drivers/thunderbolt/tb.h | 5 +++++ > include/linux/thunderbolt.h | 1 + > 5 files changed, 42 insertions(+) > > diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c > index cb8c9c4ae93a2..75f5ce5e22978 100644 > --- a/drivers/thunderbolt/nhi.c > +++ b/drivers/thunderbolt/nhi.c > @@ -1225,6 +1225,8 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id) > { > struct tb_nhi *nhi; > struct tb *tb; > + struct pci_dev *p = NULL; > + struct tb_pci_bridge *pci_bridge, *n; > int res; > > if (!nhi_imr_valid(pdev)) { > @@ -1306,6 +1308,19 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id) > nhi_shutdown(nhi); > return res; > } > + > + if (pdev->vendor == PCI_VENDOR_ID_AMD) { > + while ((p = pci_get_device(PCI_VENDOR_ID_AMD, 0x14cd, p))) { > + pci_bridge = kmalloc(sizeof(struct tb_pci_bridge), GFP_KERNEL); > + if (!pci_bridge) > + goto cleanup; > + > + pci_bridge->bridge = p; > + INIT_LIST_HEAD(&pci_bridge->list); > + list_add(&pci_bridge->list, &tb->bridge_list); > + } > + } You can't walk the device tree and create a "shadow" list of devices like this and expect any lifetime rules to work properly with them at all. Please do not do this. greg k-h
On Mon, Sep 05, 2022 at 03:26:28PM +0800, Kai-Heng Feng wrote: > Hi Mika, > > On Mon, Sep 5, 2022 at 3:06 PM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > Hi, > > > > On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote: > > > AMD USB4 can not detect external PCIe devices like external NVMe when > > > it's hotplugged, because card/link are not up: > > > > > > pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101 > > > > I think the correct solution is then to block them from runtime > > suspending entirely. > > Do you mean disable runtime suspend completely? Or just block runtime > suspend for a period? Completely. The port should enter D3 if it cannot wake up and Linux does not even enable runtime PM for such ports unless they declare "HotPlugSupportInD3" in their ACPI description: https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3 So that property should not be there if they cannot wake up. > > > Use `lspci` to resume pciehp bridges can find external devices. > > > > > > A long delay before checking card/link presence doesn't help, either. > > > The only way to make the hotplug work is to enable pciehp interrupt and > > > check card presence after the TB switch is added. > > > > > > Since the topology of USB4 and its PCIe bridges are siblings, hardcode > > > the bridge ID so TBT driver can wake them up to check presence. > > > > Let's not add PCI things into TBT driver unless absolutely necessary. > > OK. It's getting harder as different components are intertwined > together on new hardwares... > > > > > At least on Intel hardware the PCIe hotplug is signaled by SCI when the > > root port is in D3, I wonder if AMD has something similar. > > Yes those root ports are resumed to D0 when something is plugged. They > however fail to detect any externel PCIe devices. Hmm, so you see the actual hotplug but the tunneled PCIe link may not be detected? Does the PCIe "Card Present" (or Data Link Layer Active) status change at all or is it always 0?
Hi, On Mon, Sep 05, 2022 at 10:50:33AM +0300, Mika Westerberg wrote: > > Yes those root ports are resumed to D0 when something is plugged. They > > however fail to detect any externel PCIe devices. > > Hmm, so you see the actual hotplug but the tunneled PCIe link may not be > detected? Does the PCIe "Card Present" (or Data Link Layer Active) > status change at all or is it always 0? I wonder if we are simply missing the required delays here? Looking at the lspci dump in the bugzilla you refer the root port 03.1 supports active link reporting: LnkCap: Port #247, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L1 <4us ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+ So when the link goes back to L0 (D3 -> D0 transition) the kernel should issue the 100+ ms reset delay in pci_bridge_wait_for_secondary_bus(): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pci.c#n5045 can you check if that is happening? It should show up in the dmesg when CONFIG_PCI_DEBUG=y but I don't see it in yours.
On Mon, Sep 5, 2022 at 3:50 PM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > On Mon, Sep 05, 2022 at 03:26:28PM +0800, Kai-Heng Feng wrote: > > Hi Mika, > > > > On Mon, Sep 5, 2022 at 3:06 PM Mika Westerberg > > <mika.westerberg@linux.intel.com> wrote: > > > > > > Hi, > > > > > > On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote: > > > > AMD USB4 can not detect external PCIe devices like external NVMe when > > > > it's hotplugged, because card/link are not up: > > > > > > > > pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101 > > > > > > I think the correct solution is then to block them from runtime > > > suspending entirely. > > > > Do you mean disable runtime suspend completely? Or just block runtime > > suspend for a period? > > Completely. The port should enter D3 if it cannot wake up and Linux does > not even enable runtime PM for such ports unless they declare > "HotPlugSupportInD3" in their ACPI description: > > https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3 > > So that property should not be there if they cannot wake up. The port can wake up but card/link isn't up yet. > > > > > Use `lspci` to resume pciehp bridges can find external devices. > > > > > > > > A long delay before checking card/link presence doesn't help, either. > > > > The only way to make the hotplug work is to enable pciehp interrupt and > > > > check card presence after the TB switch is added. > > > > > > > > Since the topology of USB4 and its PCIe bridges are siblings, hardcode > > > > the bridge ID so TBT driver can wake them up to check presence. > > > > > > Let's not add PCI things into TBT driver unless absolutely necessary. > > > > OK. It's getting harder as different components are intertwined > > together on new hardwares... > > > > > > > > At least on Intel hardware the PCIe hotplug is signaled by SCI when the > > > root port is in D3, I wonder if AMD has something similar. > > > > Yes those root ports are resumed to D0 when something is plugged. They > > however fail to detect any externel PCIe devices. > > Hmm, so you see the actual hotplug but the tunneled PCIe link may not be > detected? Does the PCIe "Card Present" (or Data Link Layer Active) > status change at all or is it always 0? It changes only after tb_switch_add() is called. Kai-Heng
On Mon, Sep 5, 2022 at 9:18 PM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > Hi, > > On Mon, Sep 05, 2022 at 10:50:33AM +0300, Mika Westerberg wrote: > > > Yes those root ports are resumed to D0 when something is plugged. They > > > however fail to detect any externel PCIe devices. > > > > Hmm, so you see the actual hotplug but the tunneled PCIe link may not be > > detected? Does the PCIe "Card Present" (or Data Link Layer Active) > > status change at all or is it always 0? > > I wonder if we are simply missing the required delays here? Looking at > the lspci dump in the bugzilla you refer the root port 03.1 supports > active link reporting: > > LnkCap: Port #247, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L1 <4us > ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+ > > So when the link goes back to L0 (D3 -> D0 transition) the kernel should > issue the 100+ ms reset delay in pci_bridge_wait_for_secondary_bus(): This was actually the first thing I tried but it doesn't work. Even a 5 seconds delay doesn't work either. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pci.c#n5045 > > can you check if that is happening? It should show up in the dmesg when > CONFIG_PCI_DEBUG=y but I don't see it in yours. That's because there isn't any child device yet, so the function bails early: if (!dev->subordinate || list_empty(&dev->subordinate->devices)) Kai-Heng
On Mon, Sep 05, 2022 at 11:21:36PM +0800, Kai-Heng Feng wrote: > > Hmm, so you see the actual hotplug but the tunneled PCIe link may not be > > detected? Does the PCIe "Card Present" (or Data Link Layer Active) > > status change at all or is it always 0? > > It changes only after tb_switch_add() is called. I doubt tb_switch_add() does anything but instead it is the established PCIe tunnel that then shows up as it toggles the Card Present bit or so. But that should also trigger PME if the root port is in D3 so you should see this wake if everything works accordingly (unless I'm missing something). So if you do this: 1. Boot the system up, nothing connected 2. Plug in the TBT/USB4 device but do not authorize the PCIe tunnel 3. Wait for the TBT/USB4 domain to enter sleep (runtime suspend) 4. Authorize the PCIe tunnel # echo 1 > .../authorized The established PCIe tunnel should trigger PME and the root port then should be able to detect the PCIe link. Can you add full dmesg with "thunderbolt.dyndbg=+p" in the command line to the bug?
On Mon, Sep 05, 2022 at 11:24:26PM +0800, Kai-Heng Feng wrote: > > can you check if that is happening? It should show up in the dmesg when > > CONFIG_PCI_DEBUG=y but I don't see it in yours. > > That's because there isn't any child device yet, so the function bails early: > if (!dev->subordinate || list_empty(&dev->subordinate->devices)) Ah, of course I forgot that this is hotplug case so there are no need for the delays :/
On Mon, Sep 5, 2022 at 11:34 PM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > On Mon, Sep 05, 2022 at 11:21:36PM +0800, Kai-Heng Feng wrote: > > > Hmm, so you see the actual hotplug but the tunneled PCIe link may not be > > > detected? Does the PCIe "Card Present" (or Data Link Layer Active) > > > status change at all or is it always 0? > > > > It changes only after tb_switch_add() is called. > > I doubt tb_switch_add() does anything but instead it is the established > PCIe tunnel that then shows up as it toggles the Card Present bit or so. > But that should also trigger PME if the root port is in D3 so you should > see this wake if everything works accordingly (unless I'm missing > something). You are right. Sometimes it may still fail to detect hotplugged device right after tb_switch_add(). At which point PCIe tunnels are established? Is it after tb_scan_port()? I found that it's cleaner to wakeup hotplug ports via iterating device link consumers at the end of tb_scan_port(). According to your commit b2be2b05cf3b1c7b499d3b05decdcc524879fea7 ("thunderbolt: Create device links from ACPI description"), it states "The _DSD can be added to tunneled USB3 and PCIe ports, and is needed to make sure the USB4 NHI is resumed before any of the tunneled ports so the protocol tunnels get established properly before the actual port itself is resumed. Othwerwise the USB/PCI core find the link may not be established and starts tearing down the device stack." So isn't waking them up a logical thing to do here? > > So if you do this: > > 1. Boot the system up, nothing connected > 2. Plug in the TBT/USB4 device but do not authorize the PCIe tunnel > 3. Wait for the TBT/USB4 domain to enter sleep (runtime suspend) > 4. Authorize the PCIe tunnel > > # echo 1 > .../authorized > > The established PCIe tunnel should trigger PME and the root port then > should be able to detect the PCIe link. Can you add full dmesg with > "thunderbolt.dyndbg=+p" in the command line to the bug? dmesg attached. Unfortunately there's no PME. Kai-Heng
On Tue, Sep 06, 2022 at 08:57:20PM +0800, Kai-Heng Feng wrote: > On Mon, Sep 5, 2022 at 11:34 PM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > On Mon, Sep 05, 2022 at 11:21:36PM +0800, Kai-Heng Feng wrote: > > > > Hmm, so you see the actual hotplug but the tunneled PCIe link may not be > > > > detected? Does the PCIe "Card Present" (or Data Link Layer Active) > > > > status change at all or is it always 0? > > > > > > It changes only after tb_switch_add() is called. > > > > I doubt tb_switch_add() does anything but instead it is the established > > PCIe tunnel that then shows up as it toggles the Card Present bit or so. > > But that should also trigger PME if the root port is in D3 so you should > > see this wake if everything works accordingly (unless I'm missing > > something). > > You are right. Sometimes it may still fail to detect hotplugged device > right after tb_switch_add(). > At which point PCIe tunnels are established? Is it after tb_scan_port()? They are established when userspace writes "1" to ../authorized of the device (not automatically). On Ubuntu that's boltd that handles this so you may need to disable it before you do the experiment. > I found that it's cleaner to wakeup hotplug ports via iterating device > link consumers at the end of tb_scan_port(). > > According to your commit b2be2b05cf3b1c7b499d3b05decdcc524879fea7 > ("thunderbolt: Create device links from ACPI description"), it states > "The _DSD can be added to tunneled USB3 and PCIe ports, and is needed to > make sure the USB4 NHI is resumed before any of the tunneled ports so > the protocol tunnels get established properly before the actual port > itself is resumed. Othwerwise the USB/PCI core find the link may not be > established and starts tearing down the device stack." > > So isn't waking them up a logical thing to do here? No they should wake up themselves. > > So if you do this: > > > > 1. Boot the system up, nothing connected > > 2. Plug in the TBT/USB4 device but do not authorize the PCIe tunnel > > 3. Wait for the TBT/USB4 domain to enter sleep (runtime suspend) > > 4. Authorize the PCIe tunnel > > > > # echo 1 > .../authorized > > > > The established PCIe tunnel should trigger PME and the root port then > > should be able to detect the PCIe link. Can you add full dmesg with > > "thunderbolt.dyndbg=+p" in the command line to the bug? > > dmesg attached. Unfortunately there's no PME. Hmm, attached to where? Forgot to attach? ;-)
On Tue, Sep 6, 2022 at 9:37 PM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > On Tue, Sep 06, 2022 at 08:57:20PM +0800, Kai-Heng Feng wrote: > > On Mon, Sep 5, 2022 at 11:34 PM Mika Westerberg > > <mika.westerberg@linux.intel.com> wrote: > > > > > > On Mon, Sep 05, 2022 at 11:21:36PM +0800, Kai-Heng Feng wrote: > > > > > Hmm, so you see the actual hotplug but the tunneled PCIe link may not be > > > > > detected? Does the PCIe "Card Present" (or Data Link Layer Active) > > > > > status change at all or is it always 0? > > > > > > > > It changes only after tb_switch_add() is called. > > > > > > I doubt tb_switch_add() does anything but instead it is the established > > > PCIe tunnel that then shows up as it toggles the Card Present bit or so. > > > But that should also trigger PME if the root port is in D3 so you should > > > see this wake if everything works accordingly (unless I'm missing > > > something). > > > > You are right. Sometimes it may still fail to detect hotplugged device > > right after tb_switch_add(). > > At which point PCIe tunnels are established? Is it after tb_scan_port()? > > They are established when userspace writes "1" to ../authorized of the > device (not automatically). > > On Ubuntu that's boltd that handles this so you may need to disable it > before you do the experiment. In the dmesg it was disabled and "authorized" was 0 originally. > > > I found that it's cleaner to wakeup hotplug ports via iterating device > > link consumers at the end of tb_scan_port(). > > > > According to your commit b2be2b05cf3b1c7b499d3b05decdcc524879fea7 > > ("thunderbolt: Create device links from ACPI description"), it states > > "The _DSD can be added to tunneled USB3 and PCIe ports, and is needed to > > make sure the USB4 NHI is resumed before any of the tunneled ports so > > the protocol tunnels get established properly before the actual port > > itself is resumed. Othwerwise the USB/PCI core find the link may not be > > established and starts tearing down the device stack." > > > > So isn't waking them up a logical thing to do here? > > No they should wake up themselves. OK. > > > > So if you do this: > > > > > > 1. Boot the system up, nothing connected > > > 2. Plug in the TBT/USB4 device but do not authorize the PCIe tunnel > > > 3. Wait for the TBT/USB4 domain to enter sleep (runtime suspend) > > > 4. Authorize the PCIe tunnel > > > > > > # echo 1 > .../authorized > > > > > > The established PCIe tunnel should trigger PME and the root port then > > > should be able to detect the PCIe link. Can you add full dmesg with > > > "thunderbolt.dyndbg=+p" in the command line to the bug? > > > > dmesg attached. Unfortunately there's no PME. > > Hmm, attached to where? Forgot to attach? ;-) Oops, it's attached to the Bugzilla now. Kai-Heng
On Tue, Sep 06, 2022 at 10:29:03PM +0800, Kai-Heng Feng wrote: > On Tue, Sep 6, 2022 at 9:37 PM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > On Tue, Sep 06, 2022 at 08:57:20PM +0800, Kai-Heng Feng wrote: > > > On Mon, Sep 5, 2022 at 11:34 PM Mika Westerberg > > > <mika.westerberg@linux.intel.com> wrote: > > > > > > > > On Mon, Sep 05, 2022 at 11:21:36PM +0800, Kai-Heng Feng wrote: > > > > > > Hmm, so you see the actual hotplug but the tunneled PCIe link may not be > > > > > > detected? Does the PCIe "Card Present" (or Data Link Layer Active) > > > > > > status change at all or is it always 0? > > > > > > > > > > It changes only after tb_switch_add() is called. > > > > > > > > I doubt tb_switch_add() does anything but instead it is the established > > > > PCIe tunnel that then shows up as it toggles the Card Present bit or so. > > > > But that should also trigger PME if the root port is in D3 so you should > > > > see this wake if everything works accordingly (unless I'm missing > > > > something). > > > > > > You are right. Sometimes it may still fail to detect hotplugged device > > > right after tb_switch_add(). > > > At which point PCIe tunnels are established? Is it after tb_scan_port()? > > > > They are established when userspace writes "1" to ../authorized of the > > device (not automatically). > > > > On Ubuntu that's boltd that handles this so you may need to disable it > > before you do the experiment. > > In the dmesg it was disabled and "authorized" was 0 originally. OK. > > > > > I found that it's cleaner to wakeup hotplug ports via iterating device > > > link consumers at the end of tb_scan_port(). > > > > > > According to your commit b2be2b05cf3b1c7b499d3b05decdcc524879fea7 > > > ("thunderbolt: Create device links from ACPI description"), it states > > > "The _DSD can be added to tunneled USB3 and PCIe ports, and is needed to > > > make sure the USB4 NHI is resumed before any of the tunneled ports so > > > the protocol tunnels get established properly before the actual port > > > itself is resumed. Othwerwise the USB/PCI core find the link may not be > > > established and starts tearing down the device stack." > > > > > > So isn't waking them up a logical thing to do here? > > > > No they should wake up themselves. > > OK. > > > > > > > So if you do this: > > > > > > > > 1. Boot the system up, nothing connected > > > > 2. Plug in the TBT/USB4 device but do not authorize the PCIe tunnel > > > > 3. Wait for the TBT/USB4 domain to enter sleep (runtime suspend) > > > > 4. Authorize the PCIe tunnel > > > > > > > > # echo 1 > .../authorized > > > > > > > > The established PCIe tunnel should trigger PME and the root port then > > > > should be able to detect the PCIe link. Can you add full dmesg with > > > > "thunderbolt.dyndbg=+p" in the command line to the bug? > > > > > > dmesg attached. Unfortunately there's no PME. > > > > Hmm, attached to where? Forgot to attach? ;-) > > Oops, it's attached to the Bugzilla now. Okay I see in the dmesg now that the root port enters D3hot: [ 40.363249] pcieport 0000:00:04.1: saving config space at offset 0x34 (reading 0x50) [ 40.363257] pcieport 0000:00:04.1: saving config space at offset 0x38 (reading 0x0) [ 40.363265] pcieport 0000:00:04.1: saving config space at offset 0x3c (reading 0x200ff) [ 40.365052] pcieport 0000:00:04.1: PME# enabled and then the PCIe tunnel is authorized: [ 58.936258] thunderbolt 0000:64:00.6: 0:5 <-> 2:6 (PCI): activating [ 58.936270] thunderbolt 0000:64:00.6: activating PCIe Down path from 0:5 to 2:6 [ 58.936434] thunderbolt 0000:64:00.6: 2:1: Writing hop 1 [ 58.936443] thunderbolt 0000:64:00.6: 2:1: In HopID: 8 => Out port: 6 Out HopID: 8 [ 58.936449] thunderbolt 0000:64:00.6: 2:1: Weight: 1 Priority: 3 Credits: 32 Drop: 0 [ 58.936453] thunderbolt 0000:64:00.6: 2:1: Counter enabled: 0 Counter index: 2047 [ 58.936457] thunderbolt 0000:64:00.6: 2:1: Flow Control (In/Eg): 1/0 Shared Buffer (In/Eg): 0/0 [ 58.936460] thunderbolt 0000:64:00.6: 2:1: Unknown1: 0x0 Unknown2: 0x0 Unknown3: 0x0 [ 58.936786] thunderbolt 0000:64:00.6: 0:5: Writing hop 0 [ 58.936795] thunderbolt 0000:64:00.6: 0:5: In HopID: 8 => Out port: 2 Out HopID: 8 [ 58.936800] thunderbolt 0000:64:00.6: 0:5: Weight: 1 Priority: 3 Credits: 7 Drop: 0 [ 58.936803] thunderbolt 0000:64:00.6: 0:5: Counter enabled: 0 Counter index: 2047 [ 58.936807] thunderbolt 0000:64:00.6: 0:5: Flow Control (In/Eg): 1/1 Shared Buffer (In/Eg): 0/0 [ 58.936810] thunderbolt 0000:64:00.6: 0:5: Unknown1: 0x0 Unknown2: 0x0 Unknown3: 0x0 [ 58.936971] thunderbolt 0000:64:00.6: path activation complete [ 58.936979] thunderbolt 0000:64:00.6: activating PCIe Up path from 2:6 to 0:5 [ 58.937129] thunderbolt 0000:64:00.6: 0:2: Writing hop 1 [ 58.937138] thunderbolt 0000:64:00.6: 0:2: In HopID: 8 => Out port: 5 Out HopID: 8 [ 58.937143] thunderbolt 0000:64:00.6: 0:2: Weight: 1 Priority: 3 Credits: 32 Drop: 0 [ 58.937147] thunderbolt 0000:64:00.6: 0:2: Counter enabled: 0 Counter index: 2047 [ 58.937150] thunderbolt 0000:64:00.6: 0:2: Flow Control (In/Eg): 1/0 Shared Buffer (In/Eg): 0/0 [ 58.937154] thunderbolt 0000:64:00.6: 0:2: Unknown1: 0x0 Unknown2: 0x0 Unknown3: 0x0 [ 58.937453] thunderbolt 0000:64:00.6: 2:6: Writing hop 0 [ 58.937462] thunderbolt 0000:64:00.6: 2:6: In HopID: 8 => Out port: 1 Out HopID: 8 [ 58.937467] thunderbolt 0000:64:00.6: 2:6: Weight: 1 Priority: 3 Credits: 7 Drop: 0 [ 58.937471] thunderbolt 0000:64:00.6: 2:6: Counter enabled: 0 Counter index: 2047 [ 58.937474] thunderbolt 0000:64:00.6: 2:6: Flow Control (In/Eg): 1/1 Shared Buffer (In/Eg): 0/0 [ 58.937478] thunderbolt 0000:64:00.6: 2:6: Unknown1: 0x0 Unknown2: 0x0 Unknown3: 0x0 [ 58.937633] thunderbolt 0000:64:00.6: path activation complete but this is not detected by the root port so after a while the domain is suspended as well: [ 84.327759] thunderbolt 0000:64:00.6: 0: suspending switch What should happen right after the PCIe tunnel activation is that the root port should get PME / hotplug and it should then find the devices but this for some reason is not happening. This reminded me that in Intel hardware there is an ACPI power resource that is shared between related devices. IIRC there is _PR0() method under the root port, xHCI and the TBT NHI that returns the same power resource. Now, when the power resource is turned on for any of the devices the kernel wakes up the rest too to make sure they get properly re-initialized if they went into D0unitialized or something like that. The commit that added this is 4533771c1e53 ("ACPI / PM: Introduce concept of a _PR0 dependent device"). I'm not entirely sure if that applies here because none of the devices seem to enter D3cold, though. Is this working on Windows side and do you know if it goes into D3cold too?
On Tue, Sep 06, 2022 at 05:59:49PM +0300, Mika Westerberg wrote: > This reminded me that in Intel hardware there is an ACPI power resource that is > shared between related devices. IIRC there is _PR0() method under the root > port, xHCI and the TBT NHI that returns the same power resource. Now, when the > power resource is turned on for any of the devices the kernel wakes up the rest > too to make sure they get properly re-initialized if they went into > D0unitialized or something like that. The commit that added this is > 4533771c1e53 ("ACPI / PM: Introduce concept of a _PR0 dependent device"). Probably has nothing to do with this actually.
[Public] Hi, > -----Original Message----- > From: Greg KH <gregkh@linuxfoundation.org> > Sent: Monday, September 5, 2022 02:30 > To: Kai-Heng Feng <kai.heng.feng@canonical.com> > Cc: mika.westerberg@linux.intel.com; andreas.noever@gmail.com; > michael.jamet@intel.com; YehezkelShB@gmail.com; Mehta, Sanju > <Sanju.Mehta@amd.com>; Limonciello, Mario > <Mario.Limonciello@amd.com>; linux-usb@vger.kernel.org; linux- > kernel@vger.kernel.org > Subject: Re: [PATCH] thunderbolt: Resume PCIe bridges after switch is found > on AMD USB4 controller > > On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote: > > AMD USB4 can not detect external PCIe devices like external NVMe when > > it's hotplugged, because card/link are not up: > > > > pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101 > > That sounds like a hardware bug, how does this work in other operating > systems for this hardware? We happen to have this HP system in our lab. My colleague Anson (now on CC) flashed the same BIOS to it (01.02.01) using dediprog and loaded a 6.0-rc3 mainline kernel built from the Canonical mainline kernel PPA. He then tried to hotplug a TBT3 SSD a number of times but couldn't hit this issue. I attached his log to the kernel Bugzilla. > > > Use `lspci` to resume pciehp bridges can find external devices. > > That's not good :( > > > A long delay before checking card/link presence doesn't help, either. > > The only way to make the hotplug work is to enable pciehp interrupt and > > check card presence after the TB switch is added. > > > > Since the topology of USB4 and its PCIe bridges are siblings, hardcode > > the bridge ID so TBT driver can wake them up to check presence. > > As I mention below, this is not an acceptable solution. > > AMD developers, any ideas on how to get this fixed in the TB controller > firware instead? Anson also double checked on the AMD reference hardware that the HP system is built against and couldn't reproduce it there either. KH, I've got a few questions/comments to try to better explain why we're here. 1) How did you flash the 01.02.01 firmware? In Anson's check, he used dediprog. Is it possible there was some stateful stuff used by HP's BIOS still on the SPI from the upgrade that didn't get set/cleared properly from an earlier pre-release BIOS? 2) Did you change any BIOS settings? Particularly anything to do with Pre-OS CM? 3) If you explicitly reset to HP's "default BIOS settings" does it resolve? 4) Can you double check ADP_CS_5 bit 31? I attached is a patch to kernel Bugzilla to add dyndbg output for it. If it was for some reason set by Pre-OS CM in your BIOS/settings combination, we might need to undo it by the Linux CM. 5) Are you changing any of the default runtime PM policies for any of the USB4 routers or root ports used for tunneling using software like TLP? > > > > > Bugzilla: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz > illa.kernel.org%2Fshow_bug.cgi%3Fid%3D216448&data=05%7C01%7Cm > ario.limonciello%40amd.com%7C1e27b1d6f69e42796c7b08da8f107121%7C3d > d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637979598042186185%7CU > nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI > 6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0lhcaKfUyoK > 0FXT9uDZ8a%2Fpxs9tHd8aoQcyPFdB%2F0eY%3D&reserved=0 > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > --- > > drivers/thunderbolt/nhi.c | 29 +++++++++++++++++++++++++++++ > > drivers/thunderbolt/switch.c | 6 ++++++ > > drivers/thunderbolt/tb.c | 1 + > > drivers/thunderbolt/tb.h | 5 +++++ > > include/linux/thunderbolt.h | 1 + > > 5 files changed, 42 insertions(+) > > > > diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c > > index cb8c9c4ae93a2..75f5ce5e22978 100644 > > --- a/drivers/thunderbolt/nhi.c > > +++ b/drivers/thunderbolt/nhi.c > > @@ -1225,6 +1225,8 @@ static int nhi_probe(struct pci_dev *pdev, const > struct pci_device_id *id) > > { > > struct tb_nhi *nhi; > > struct tb *tb; > > + struct pci_dev *p = NULL; > > + struct tb_pci_bridge *pci_bridge, *n; > > int res; > > > > if (!nhi_imr_valid(pdev)) { > > @@ -1306,6 +1308,19 @@ static int nhi_probe(struct pci_dev *pdev, const > struct pci_device_id *id) > > nhi_shutdown(nhi); > > return res; > > } > > + > > + if (pdev->vendor == PCI_VENDOR_ID_AMD) { > > + while ((p = pci_get_device(PCI_VENDOR_ID_AMD, 0x14cd, > p))) { > > + pci_bridge = kmalloc(sizeof(struct tb_pci_bridge), > GFP_KERNEL); > > + if (!pci_bridge) > > + goto cleanup; > > + > > + pci_bridge->bridge = p; > > + INIT_LIST_HEAD(&pci_bridge->list); > > + list_add(&pci_bridge->list, &tb->bridge_list); > > + } > > + } > > You can't walk the device tree and create a "shadow" list of devices > like this and expect any lifetime rules to work properly with them at > all. > > Please do not do this. > > greg k-h
" On Thu, Sep 8, 2022 at 12:30 AM Limonciello, Mario <Mario.Limonciello@amd.com> wrote: > > [Public] > > Hi, > > > -----Original Message----- > > From: Greg KH <gregkh@linuxfoundation.org> > > Sent: Monday, September 5, 2022 02:30 > > To: Kai-Heng Feng <kai.heng.feng@canonical.com> > > Cc: mika.westerberg@linux.intel.com; andreas.noever@gmail.com; > > michael.jamet@intel.com; YehezkelShB@gmail.com; Mehta, Sanju > > <Sanju.Mehta@amd.com>; Limonciello, Mario > > <Mario.Limonciello@amd.com>; linux-usb@vger.kernel.org; linux- > > kernel@vger.kernel.org > > Subject: Re: [PATCH] thunderbolt: Resume PCIe bridges after switch is found > > on AMD USB4 controller > > > > On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote: > > > AMD USB4 can not detect external PCIe devices like external NVMe when > > > it's hotplugged, because card/link are not up: > > > > > > pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101 > > > > That sounds like a hardware bug, how does this work in other operating > > systems for this hardware? > > We happen to have this HP system in our lab. My colleague Anson (now on CC) flashed > the same BIOS to it (01.02.01) using dediprog and loaded a 6.0-rc3 mainline kernel built > from the Canonical mainline kernel PPA. > > He then tried to hotplug a TBT3 SSD a number of times but couldn't hit this issue. > I attached his log to the kernel Bugzilla. Nice to hear. Hopefully this can be fixed at firmware/hardware side. > > > > > > Use `lspci` to resume pciehp bridges can find external devices. > > > > That's not good :( > > > > > A long delay before checking card/link presence doesn't help, either. > > > The only way to make the hotplug work is to enable pciehp interrupt and > > > check card presence after the TB switch is added. > > > > > > Since the topology of USB4 and its PCIe bridges are siblings, hardcode > > > the bridge ID so TBT driver can wake them up to check presence. > > > > As I mention below, this is not an acceptable solution. > > > > AMD developers, any ideas on how to get this fixed in the TB controller > > firware instead? > > Anson also double checked on the AMD reference hardware that the HP system is built > against and couldn't reproduce it there either. > > KH, I've got a few questions/comments to try to better explain why we're here. > > 1) How did you flash the 01.02.01 firmware? In Anson's check, he used dediprog. > Is it possible there was some stateful stuff used by HP's BIOS still on the SPI from the > upgrade that didn't get set/cleared properly from an earlier pre-release BIOS? We used UEFI capsule to update the firmware, via fwupd. > > 2) Did you change any BIOS settings? Particularly anything to do with Pre-OS CM? No, nothing in BIOS was changed. > > 3) If you explicitly reset to HP's "default BIOS settings" does it resolve? Doesn't help. I put the device to ACPI G3 and it doesn't help, either. > > 4) Can you double check ADP_CS_5 bit 31? I attached is a patch to kernel Bugzilla to > add dyndbg output for it. If it was for some reason set by Pre-OS CM in your BIOS/settings > combination, we might need to undo it by the Linux CM. All ports say "Hotplug disabled: 0". dmesg attached to the bugzilla. > > 5) Are you changing any of the default runtime PM policies for any of the USB4 routers or > root ports used for tunneling using software like TLP? No. And they should be suspended by default. Kai-Heng > > > > > > > > > Bugzilla: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz > > illa.kernel.org%2Fshow_bug.cgi%3Fid%3D216448&data=05%7C01%7Cm > > ario.limonciello%40amd.com%7C1e27b1d6f69e42796c7b08da8f107121%7C3d > > d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637979598042186185%7CU > > nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI > > 6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0lhcaKfUyoK > > 0FXT9uDZ8a%2Fpxs9tHd8aoQcyPFdB%2F0eY%3D&reserved=0 > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > > --- > > > drivers/thunderbolt/nhi.c | 29 +++++++++++++++++++++++++++++ > > > drivers/thunderbolt/switch.c | 6 ++++++ > > > drivers/thunderbolt/tb.c | 1 + > > > drivers/thunderbolt/tb.h | 5 +++++ > > > include/linux/thunderbolt.h | 1 + > > > 5 files changed, 42 insertions(+) > > > > > > diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c > > > index cb8c9c4ae93a2..75f5ce5e22978 100644 > > > --- a/drivers/thunderbolt/nhi.c > > > +++ b/drivers/thunderbolt/nhi.c > > > @@ -1225,6 +1225,8 @@ static int nhi_probe(struct pci_dev *pdev, const > > struct pci_device_id *id) > > > { > > > struct tb_nhi *nhi; > > > struct tb *tb; > > > + struct pci_dev *p = NULL; > > > + struct tb_pci_bridge *pci_bridge, *n; > > > int res; > > > > > > if (!nhi_imr_valid(pdev)) { > > > @@ -1306,6 +1308,19 @@ static int nhi_probe(struct pci_dev *pdev, const > > struct pci_device_id *id) > > > nhi_shutdown(nhi); > > > return res; > > > } > > > + > > > + if (pdev->vendor == PCI_VENDOR_ID_AMD) { > > > + while ((p = pci_get_device(PCI_VENDOR_ID_AMD, 0x14cd, > > p))) { > > > + pci_bridge = kmalloc(sizeof(struct tb_pci_bridge), > > GFP_KERNEL); > > > + if (!pci_bridge) > > > + goto cleanup; > > > + > > > + pci_bridge->bridge = p; > > > + INIT_LIST_HEAD(&pci_bridge->list); > > > + list_add(&pci_bridge->list, &tb->bridge_list); > > > + } > > > + } > > > > You can't walk the device tree and create a "shadow" list of devices > > like this and expect any lifetime rules to work properly with them at > > all. > > > > Please do not do this. > > > > greg k-h
On 9/8/2022 09:02, Kai-Heng Feng wrote: > " > > On Thu, Sep 8, 2022 at 12:30 AM Limonciello, Mario > <Mario.Limonciello@amd.com> wrote: >> >> [Public] >> >> Hi, >> >>> -----Original Message----- >>> From: Greg KH <gregkh@linuxfoundation.org> >>> Sent: Monday, September 5, 2022 02:30 >>> To: Kai-Heng Feng <kai.heng.feng@canonical.com> >>> Cc: mika.westerberg@linux.intel.com; andreas.noever@gmail.com; >>> michael.jamet@intel.com; YehezkelShB@gmail.com; Mehta, Sanju >>> <Sanju.Mehta@amd.com>; Limonciello, Mario >>> <Mario.Limonciello@amd.com>; linux-usb@vger.kernel.org; linux- >>> kernel@vger.kernel.org >>> Subject: Re: [PATCH] thunderbolt: Resume PCIe bridges after switch is found >>> on AMD USB4 controller >>> >>> On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote: >>>> AMD USB4 can not detect external PCIe devices like external NVMe when >>>> it's hotplugged, because card/link are not up: >>>> >>>> pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101 >>> >>> That sounds like a hardware bug, how does this work in other operating >>> systems for this hardware? >> >> We happen to have this HP system in our lab. My colleague Anson (now on CC) flashed >> the same BIOS to it (01.02.01) using dediprog and loaded a 6.0-rc3 mainline kernel built >> from the Canonical mainline kernel PPA. >> >> He then tried to hotplug a TBT3 SSD a number of times but couldn't hit this issue. >> I attached his log to the kernel Bugzilla. > > Nice to hear. Hopefully this can be fixed at firmware/hardware side. I guess you and Anson might want to sync up offline and compare whether you have the same hardware stepping. > >> >>> >>>> Use `lspci` to resume pciehp bridges can find external devices. >>> >>> That's not good :( >>> >>>> A long delay before checking card/link presence doesn't help, either. >>>> The only way to make the hotplug work is to enable pciehp interrupt and >>>> check card presence after the TB switch is added. >>>> >>>> Since the topology of USB4 and its PCIe bridges are siblings, hardcode >>>> the bridge ID so TBT driver can wake them up to check presence. >>> >>> As I mention below, this is not an acceptable solution. >>> >>> AMD developers, any ideas on how to get this fixed in the TB controller >>> firware instead? >> >> Anson also double checked on the AMD reference hardware that the HP system is built >> against and couldn't reproduce it there either. >> >> KH, I've got a few questions/comments to try to better explain why we're here. >> >> 1) How did you flash the 01.02.01 firmware? In Anson's check, he used dediprog. >> Is it possible there was some stateful stuff used by HP's BIOS still on the SPI from the >> upgrade that didn't get set/cleared properly from an earlier pre-release BIOS? > > We used UEFI capsule to update the firmware, via fwupd. So that's a difference from how Anson did it. Could you perhaps dump the BIOS SPI image? Anson can flash the exact same dump via dediprog and see if he can repro. It would let us confirm if it was caused by your upgrade path. > >> >> 2) Did you change any BIOS settings? Particularly anything to do with Pre-OS CM? > > No, nothing in BIOS was changed. > >> >> 3) If you explicitly reset to HP's "default BIOS settings" does it resolve? > > Doesn't help. I put the device to ACPI G3 and it doesn't help, either. OK. > >> >> 4) Can you double check ADP_CS_5 bit 31? I attached is a patch to kernel Bugzilla to >> add dyndbg output for it. If it was for some reason set by Pre-OS CM in your BIOS/settings >> combination, we might need to undo it by the Linux CM. > > All ports say "Hotplug disabled: 0". > > dmesg attached to the bugzilla. OK, that at least rules out DHP from Pre-OS CM. > >> >> 5) Are you changing any of the default runtime PM policies for any of the USB4 routers or >> root ports used for tunneling using software like TLP? > > No. And they should be suspended by default. > Thinking about this being possibly a firmware upgrade path problem, can you please check: # grep SMC /sys/kernel/debug/dri/0/amdgpu_firmware_info Anson's system was 0x04453200 (program 4, version 69.50.0). > Kai-Heng > >> >>> >>>> >>>> Bugzilla: >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz >>> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D216448&data=05%7C01%7Cm >>> ario.limonciello%40amd.com%7C1e27b1d6f69e42796c7b08da8f107121%7C3d >>> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637979598042186185%7CU >>> nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI >>> 6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0lhcaKfUyoK >>> 0FXT9uDZ8a%2Fpxs9tHd8aoQcyPFdB%2F0eY%3D&reserved=0 >>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> >>>> --- >>>> drivers/thunderbolt/nhi.c | 29 +++++++++++++++++++++++++++++ >>>> drivers/thunderbolt/switch.c | 6 ++++++ >>>> drivers/thunderbolt/tb.c | 1 + >>>> drivers/thunderbolt/tb.h | 5 +++++ >>>> include/linux/thunderbolt.h | 1 + >>>> 5 files changed, 42 insertions(+) >>>> >>>> diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c >>>> index cb8c9c4ae93a2..75f5ce5e22978 100644 >>>> --- a/drivers/thunderbolt/nhi.c >>>> +++ b/drivers/thunderbolt/nhi.c >>>> @@ -1225,6 +1225,8 @@ static int nhi_probe(struct pci_dev *pdev, const >>> struct pci_device_id *id) >>>> { >>>> struct tb_nhi *nhi; >>>> struct tb *tb; >>>> + struct pci_dev *p = NULL; >>>> + struct tb_pci_bridge *pci_bridge, *n; >>>> int res; >>>> >>>> if (!nhi_imr_valid(pdev)) { >>>> @@ -1306,6 +1308,19 @@ static int nhi_probe(struct pci_dev *pdev, const >>> struct pci_device_id *id) >>>> nhi_shutdown(nhi); >>>> return res; >>>> } >>>> + >>>> + if (pdev->vendor == PCI_VENDOR_ID_AMD) { >>>> + while ((p = pci_get_device(PCI_VENDOR_ID_AMD, 0x14cd, >>> p))) { >>>> + pci_bridge = kmalloc(sizeof(struct tb_pci_bridge), >>> GFP_KERNEL); >>>> + if (!pci_bridge) >>>> + goto cleanup; >>>> + >>>> + pci_bridge->bridge = p; >>>> + INIT_LIST_HEAD(&pci_bridge->list); >>>> + list_add(&pci_bridge->list, &tb->bridge_list); >>>> + } >>>> + } >>> >>> You can't walk the device tree and create a "shadow" list of devices >>> like this and expect any lifetime rules to work properly with them at >>> all. >>> >>> Please do not do this. >>> >>> greg k-h
Hi, On Thu, Sep 8, 2022 at 11:22 PM Limonciello, Mario <mario.limonciello@amd.com> wrote: [snipped] > > Nice to hear. Hopefully this can be fixed at firmware/hardware side. > > I guess you and Anson might want to sync up offline and compare whether > you have the same hardware stepping. Sure. [snipped] > > We used UEFI capsule to update the firmware, via fwupd. > > So that's a difference from how Anson did it. Could you perhaps dump > the BIOS SPI image? Anson can flash the exact same dump via dediprog > and see if he can repro. > > It would let us confirm if it was caused by your upgrade path. OK, will dissuss this with AMD/HP offline. > > > > >> > >> 2) Did you change any BIOS settings? Particularly anything to do with Pre-OS CM? > > > > No, nothing in BIOS was changed. > > >> > >> 3) If you explicitly reset to HP's "default BIOS settings" does it resolve? > > > > Doesn't help. I put the device to ACPI G3 and it doesn't help, either. > > OK. > > > > >> > >> 4) Can you double check ADP_CS_5 bit 31? I attached is a patch to kernel Bugzilla to > >> add dyndbg output for it. If it was for some reason set by Pre-OS CM in your BIOS/settings > >> combination, we might need to undo it by the Linux CM. > > > > All ports say "Hotplug disabled: 0". > > > > dmesg attached to the bugzilla. > > OK, that at least rules out DHP from Pre-OS CM. > > > > >> > >> 5) Are you changing any of the default runtime PM policies for any of the USB4 routers or > >> root ports used for tunneling using software like TLP? > > > > No. And they should be suspended by default. > > > > Thinking about this being possibly a firmware upgrade path problem, can > you please check: > > # grep SMC /sys/kernel/debug/dri/0/amdgpu_firmware_info > > Anson's system was 0x04453200 (program 4, version 69.50.0). Exactly the same here. Kai-Heng > > > Kai-Heng > > > >> > >>> > >>>> > >>>> Bugzilla: > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz > >>> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D216448&data=05%7C01%7Cm > >>> ario.limonciello%40amd.com%7C1e27b1d6f69e42796c7b08da8f107121%7C3d > >>> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637979598042186185%7CU > >>> nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI > >>> 6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0lhcaKfUyoK > >>> 0FXT9uDZ8a%2Fpxs9tHd8aoQcyPFdB%2F0eY%3D&reserved=0 > >>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > >>>> --- > >>>> drivers/thunderbolt/nhi.c | 29 +++++++++++++++++++++++++++++ > >>>> drivers/thunderbolt/switch.c | 6 ++++++ > >>>> drivers/thunderbolt/tb.c | 1 + > >>>> drivers/thunderbolt/tb.h | 5 +++++ > >>>> include/linux/thunderbolt.h | 1 + > >>>> 5 files changed, 42 insertions(+) > >>>> > >>>> diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c > >>>> index cb8c9c4ae93a2..75f5ce5e22978 100644 > >>>> --- a/drivers/thunderbolt/nhi.c > >>>> +++ b/drivers/thunderbolt/nhi.c > >>>> @@ -1225,6 +1225,8 @@ static int nhi_probe(struct pci_dev *pdev, const > >>> struct pci_device_id *id) > >>>> { > >>>> struct tb_nhi *nhi; > >>>> struct tb *tb; > >>>> + struct pci_dev *p = NULL; > >>>> + struct tb_pci_bridge *pci_bridge, *n; > >>>> int res; > >>>> > >>>> if (!nhi_imr_valid(pdev)) { > >>>> @@ -1306,6 +1308,19 @@ static int nhi_probe(struct pci_dev *pdev, const > >>> struct pci_device_id *id) > >>>> nhi_shutdown(nhi); > >>>> return res; > >>>> } > >>>> + > >>>> + if (pdev->vendor == PCI_VENDOR_ID_AMD) { > >>>> + while ((p = pci_get_device(PCI_VENDOR_ID_AMD, 0x14cd, > >>> p))) { > >>>> + pci_bridge = kmalloc(sizeof(struct tb_pci_bridge), > >>> GFP_KERNEL); > >>>> + if (!pci_bridge) > >>>> + goto cleanup; > >>>> + > >>>> + pci_bridge->bridge = p; > >>>> + INIT_LIST_HEAD(&pci_bridge->list); > >>>> + list_add(&pci_bridge->list, &tb->bridge_list); > >>>> + } > >>>> + } > >>> > >>> You can't walk the device tree and create a "shadow" list of devices > >>> like this and expect any lifetime rules to work properly with them at > >>> all. > >>> > >>> Please do not do this. > >>> > >>> greg k-h >
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index cb8c9c4ae93a2..75f5ce5e22978 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -1225,6 +1225,8 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct tb_nhi *nhi; struct tb *tb; + struct pci_dev *p = NULL; + struct tb_pci_bridge *pci_bridge, *n; int res; if (!nhi_imr_valid(pdev)) { @@ -1306,6 +1308,19 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id) nhi_shutdown(nhi); return res; } + + if (pdev->vendor == PCI_VENDOR_ID_AMD) { + while ((p = pci_get_device(PCI_VENDOR_ID_AMD, 0x14cd, p))) { + pci_bridge = kmalloc(sizeof(struct tb_pci_bridge), GFP_KERNEL); + if (!pci_bridge) + goto cleanup; + + pci_bridge->bridge = p; + INIT_LIST_HEAD(&pci_bridge->list); + list_add(&pci_bridge->list, &tb->bridge_list); + } + } + pci_set_drvdata(pdev, tb); device_wakeup_enable(&pdev->dev); @@ -1316,12 +1331,26 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id) pm_runtime_put_autosuspend(&pdev->dev); return 0; + +cleanup: + list_for_each_entry_safe(pci_bridge, n, &tb->bridge_list, list) { + list_del(&pci_bridge->list); + kfree(pci_bridge); + } + + return -ENOMEM; } static void nhi_remove(struct pci_dev *pdev) { struct tb *tb = pci_get_drvdata(pdev); struct tb_nhi *nhi = tb->nhi; + struct tb_pci_bridge *pci_bridge, *n; + + list_for_each_entry_safe(pci_bridge, n, &tb->bridge_list, list) { + list_del(&pci_bridge->list); + kfree(pci_bridge); + } pm_runtime_get_sync(&pdev->dev); pm_runtime_dont_use_autosuspend(&pdev->dev); diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c index c63c1f4ff9dc7..aae898cc907d3 100644 --- a/drivers/thunderbolt/switch.c +++ b/drivers/thunderbolt/switch.c @@ -2836,6 +2836,8 @@ static void tb_switch_credits_init(struct tb_switch *sw) int tb_switch_add(struct tb_switch *sw) { int i, ret; + struct tb *tb = sw->tb; + struct tb_pci_bridge *pci_bridge; /* * Initialize DMA control port now before we read DROM. Recent @@ -2933,6 +2935,10 @@ int tb_switch_add(struct tb_switch *sw) } tb_switch_debugfs_init(sw); + + list_for_each_entry(pci_bridge, &tb->bridge_list, list) + pm_request_resume(&pci_bridge->bridge->dev); + return 0; err_ports: diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index 9853f6c7e81d7..07e97b77ac630 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -1771,6 +1771,7 @@ struct tb *tb_probe(struct tb_nhi *nhi) tb->security_level = TB_SECURITY_NOPCIE; tb->cm_ops = &tb_cm_ops; + INIT_LIST_HEAD(&tb->bridge_list); tcm = tb_priv(tb); INIT_LIST_HEAD(&tcm->tunnel_list); diff --git a/drivers/thunderbolt/tb.h b/drivers/thunderbolt/tb.h index 5db76de40cc1c..8efbd1afacad0 100644 --- a/drivers/thunderbolt/tb.h +++ b/drivers/thunderbolt/tb.h @@ -489,6 +489,11 @@ struct tb_cm_ops { u32 *status); }; +struct tb_pci_bridge { + struct pci_dev *bridge; + struct list_head list; +}; + static inline void *tb_priv(struct tb *tb) { return (void *)tb->privdata; diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h index 9f442d73f3df8..728bb36070e9d 100644 --- a/include/linux/thunderbolt.h +++ b/include/linux/thunderbolt.h @@ -83,6 +83,7 @@ struct tb { int index; enum tb_security_level security_level; size_t nboot_acl; + struct list_head bridge_list; unsigned long privdata[]; };
AMD USB4 can not detect external PCIe devices like external NVMe when it's hotplugged, because card/link are not up: pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101 Use `lspci` to resume pciehp bridges can find external devices. A long delay before checking card/link presence doesn't help, either. The only way to make the hotplug work is to enable pciehp interrupt and check card presence after the TB switch is added. Since the topology of USB4 and its PCIe bridges are siblings, hardcode the bridge ID so TBT driver can wake them up to check presence. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216448 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> --- drivers/thunderbolt/nhi.c | 29 +++++++++++++++++++++++++++++ drivers/thunderbolt/switch.c | 6 ++++++ drivers/thunderbolt/tb.c | 1 + drivers/thunderbolt/tb.h | 5 +++++ include/linux/thunderbolt.h | 1 + 5 files changed, 42 insertions(+)