Message ID | 20240823154023.360234-1-superm1@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | Verify devices transition from D3cold to D0 | expand |
On 8/23/2024 10:40, Mario Limonciello wrote: > From: Mario Limonciello <mario.limonciello@amd.com> > > Gary has reported that when a dock is plugged into a system at the same > time the autosuspend delay has tripped that the USB4 stack malfunctions. > > Messages show up like this: > > ``` > thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled > ``` > > Furthermore the USB4 router is non-functional at this point. > > Those messages happen because the device is still in D3cold at the time > that the PCI core handed control back to the USB4 connection manager > (thunderbolt). > > The issue is that it takes time for a device to enter D3cold and do a > conventional reset, and then more time for it to exit D3cold. > > This appears not to be a new problem; previously there were very similar > reports from Ryzen XHCI controllers. Quirks were added for those. > Furthermore; adding extra logging it's apparent that other PCI devices > in the system can take more than 10ms to recover from D3cold as well. > > This series add a wait into pci_power_up() specifically for D3cold exit and > then drops the quirks that were previously used for the Ryzen XHCI controllers. > > Mario Limonciello (5): > PCI: Use an enum for reset type in pci_dev_wait() > PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait() > PCI: Verify functions currently in D3cold have entered D0 > PCI: Allow Ryzen XHCI controllers into D3cold and drop delays > PCI: Drop Radeon quirk for Macbook Pro 8.2 > > drivers/pci/pci-driver.c | 2 +- > drivers/pci/pci.c | 70 +++++++++++++++++++++++++++---------- > drivers/pci/pci.h | 13 ++++++- > drivers/pci/pcie/dpc.c | 2 +- > drivers/pci/quirks.c | 25 ------------- > drivers/usb/host/xhci-pci.c | 11 ------ > 6 files changed, 66 insertions(+), 57 deletions(-) > Bjorn, This series has stalled a while. Mika and I went back and forth and I think are generally in agreement so I think it's waiting on your feedback. Can you take another look? The alternative is to add some more piles of quirks, but I'm hoping that we can go this direction and drop a bunch of the old ones instead. LMK if you want me to rebase it on 6.13-rc1 and resend a v6.
On Wed, Dec 04, 2024 at 11:30:51AM -0600, Mario Limonciello wrote: > On 8/23/2024 10:40, Mario Limonciello wrote: > > From: Mario Limonciello <mario.limonciello@amd.com> > > > > Gary has reported that when a dock is plugged into a system at the same > > time the autosuspend delay has tripped that the USB4 stack malfunctions. > > > > Messages show up like this: > > > > ``` > > thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled > > ``` > > > > Furthermore the USB4 router is non-functional at this point. > > > > Those messages happen because the device is still in D3cold at the time > > that the PCI core handed control back to the USB4 connection manager > > (thunderbolt). > > > > The issue is that it takes time for a device to enter D3cold and do a > > conventional reset, and then more time for it to exit D3cold. > > > > This appears not to be a new problem; previously there were very similar > > reports from Ryzen XHCI controllers. Quirks were added for those. > > Furthermore; adding extra logging it's apparent that other PCI devices > > in the system can take more than 10ms to recover from D3cold as well. > > > > This series add a wait into pci_power_up() specifically for D3cold exit and > > then drops the quirks that were previously used for the Ryzen XHCI controllers. > > > > Mario Limonciello (5): > > PCI: Use an enum for reset type in pci_dev_wait() > > PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait() > > PCI: Verify functions currently in D3cold have entered D0 > > PCI: Allow Ryzen XHCI controllers into D3cold and drop delays > > PCI: Drop Radeon quirk for Macbook Pro 8.2 > > > > drivers/pci/pci-driver.c | 2 +- > > drivers/pci/pci.c | 70 +++++++++++++++++++++++++++---------- > > drivers/pci/pci.h | 13 ++++++- > > drivers/pci/pcie/dpc.c | 2 +- > > drivers/pci/quirks.c | 25 ------------- > > drivers/usb/host/xhci-pci.c | 11 ------ > > 6 files changed, 66 insertions(+), 57 deletions(-) > > Bjorn, > > This series has stalled a while. > > Mika and I went back and forth and I think are generally in agreement so I > think it's waiting on your feedback. > > Can you take another look? > > The alternative is to add some more piles of quirks, but I'm hoping that we > can go this direction and drop a bunch of the old ones instead. > > LMK if you want me to rebase it on 6.13-rc1 and resend a v6. I'm still stuck on patch 2/5 because I'm not aware of any spec language about polling PCI_PM_CTRL to wait for a power state transition, so it seems really ad hoc. If you do rebase to v6.13-rc1, in the 2/5 commit log, s/evices/devices/. I guess that whole patch and commit log needs updating since the RRS code was added to pci_dev_wait() in the interim, so the "device that has gone through a reset may return a value in PCI_COMMAND but that doesn't mean it's finished transitioning to D0" doesn't directly apply anymore. Bjorn
On 12/4/2024 17:45, Bjorn Helgaas wrote: > On Wed, Dec 04, 2024 at 11:30:51AM -0600, Mario Limonciello wrote: >> On 8/23/2024 10:40, Mario Limonciello wrote: >>> From: Mario Limonciello <mario.limonciello@amd.com> >>> >>> Gary has reported that when a dock is plugged into a system at the same >>> time the autosuspend delay has tripped that the USB4 stack malfunctions. >>> >>> Messages show up like this: >>> >>> ``` >>> thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled >>> ``` >>> >>> Furthermore the USB4 router is non-functional at this point. >>> >>> Those messages happen because the device is still in D3cold at the time >>> that the PCI core handed control back to the USB4 connection manager >>> (thunderbolt). >>> >>> The issue is that it takes time for a device to enter D3cold and do a >>> conventional reset, and then more time for it to exit D3cold. >>> >>> This appears not to be a new problem; previously there were very similar >>> reports from Ryzen XHCI controllers. Quirks were added for those. >>> Furthermore; adding extra logging it's apparent that other PCI devices >>> in the system can take more than 10ms to recover from D3cold as well. >>> >>> This series add a wait into pci_power_up() specifically for D3cold exit and >>> then drops the quirks that were previously used for the Ryzen XHCI controllers. >>> >>> Mario Limonciello (5): >>> PCI: Use an enum for reset type in pci_dev_wait() >>> PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait() >>> PCI: Verify functions currently in D3cold have entered D0 >>> PCI: Allow Ryzen XHCI controllers into D3cold and drop delays >>> PCI: Drop Radeon quirk for Macbook Pro 8.2 >>> >>> drivers/pci/pci-driver.c | 2 +- >>> drivers/pci/pci.c | 70 +++++++++++++++++++++++++++---------- >>> drivers/pci/pci.h | 13 ++++++- >>> drivers/pci/pcie/dpc.c | 2 +- >>> drivers/pci/quirks.c | 25 ------------- >>> drivers/usb/host/xhci-pci.c | 11 ------ >>> 6 files changed, 66 insertions(+), 57 deletions(-) >> >> Bjorn, >> >> This series has stalled a while. >> >> Mika and I went back and forth and I think are generally in agreement so I >> think it's waiting on your feedback. >> >> Can you take another look? >> >> The alternative is to add some more piles of quirks, but I'm hoping that we >> can go this direction and drop a bunch of the old ones instead. >> >> LMK if you want me to rebase it on 6.13-rc1 and resend a v6. > > I'm still stuck on patch 2/5 because I'm not aware of any spec > language about polling PCI_PM_CTRL to wait for a power state > transition, so it seems really ad hoc. I'm not really sure how to overcome this. If I rebase everything I'll give specs another read through in case I missed anything, but I suspect you know these specs better than anyoe on this list. Is it worth raising this to PCI-SIG to discuss? Did you perhaps already do that? > > If you do rebase to v6.13-rc1, in the 2/5 commit log, > s/evices/devices/. > > I guess that whole patch and commit log needs updating since the RRS > code was added to pci_dev_wait() in the interim, so the "device that > has gone through a reset may return a value in PCI_COMMAND but that > doesn't mean it's finished transitioning to D0" doesn't directly apply > anymore. > > Bjorn
On Wed, Dec 04, 2024 at 09:44:05PM -0600, Mario Limonciello wrote: > On 12/4/2024 17:45, Bjorn Helgaas wrote: > > On Wed, Dec 04, 2024 at 11:30:51AM -0600, Mario Limonciello wrote: > > > On 8/23/2024 10:40, Mario Limonciello wrote: > > > > From: Mario Limonciello <mario.limonciello@amd.com> > > > > > > > > Gary has reported that when a dock is plugged into a system at the same > > > > time the autosuspend delay has tripped that the USB4 stack malfunctions. > > > > > > > > Messages show up like this: > > > > > > > > ``` > > > > thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled > > > > ``` > > > > > > > > Furthermore the USB4 router is non-functional at this point. > > > > > > > > Those messages happen because the device is still in D3cold at the time > > > > that the PCI core handed control back to the USB4 connection manager > > > > (thunderbolt). > > > > > > > > The issue is that it takes time for a device to enter D3cold and do a > > > > conventional reset, and then more time for it to exit D3cold. > > > > > > > > This appears not to be a new problem; previously there were very similar > > > > reports from Ryzen XHCI controllers. Quirks were added for those. > > > > Furthermore; adding extra logging it's apparent that other PCI devices > > > > in the system can take more than 10ms to recover from D3cold as well. > > > > > > > > This series add a wait into pci_power_up() specifically for D3cold exit and > > > > then drops the quirks that were previously used for the Ryzen XHCI controllers. > > > > > > > > Mario Limonciello (5): > > > > PCI: Use an enum for reset type in pci_dev_wait() > > > > PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait() > > > > PCI: Verify functions currently in D3cold have entered D0 > > > > PCI: Allow Ryzen XHCI controllers into D3cold and drop delays > > > > PCI: Drop Radeon quirk for Macbook Pro 8.2 > > > > > > > > drivers/pci/pci-driver.c | 2 +- > > > > drivers/pci/pci.c | 70 +++++++++++++++++++++++++++---------- > > > > drivers/pci/pci.h | 13 ++++++- > > > > drivers/pci/pcie/dpc.c | 2 +- > > > > drivers/pci/quirks.c | 25 ------------- > > > > drivers/usb/host/xhci-pci.c | 11 ------ > > > > 6 files changed, 66 insertions(+), 57 deletions(-) > > > > > > Bjorn, > > > > > > This series has stalled a while. > > > > > > Mika and I went back and forth and I think are generally in agreement so I > > > think it's waiting on your feedback. > > > > > > Can you take another look? > > > > > > The alternative is to add some more piles of quirks, but I'm hoping that we > > > can go this direction and drop a bunch of the old ones instead. > > > > > > LMK if you want me to rebase it on 6.13-rc1 and resend a v6. > > > > I'm still stuck on patch 2/5 because I'm not aware of any spec > > language about polling PCI_PM_CTRL to wait for a power state > > transition, so it seems really ad hoc. > > I'm not really sure how to overcome this. If I rebase everything I'll give > specs another read through in case I missed anything, but I suspect you know > these specs better than anyoe on this list. > > Is it worth raising this to PCI-SIG to discuss? > Did you perhaps already do that? I haven't, but it seems like that might be appropriate. Can we formulate a question in terms of PCIe concepts? I guess some of the pieces are: - ACPI puts device in D3cold - ACPI restores device to D0 - Device is not yet in D0 when ACPI method has completed - OS read from device causes UR or RRS, so OS reads ~0 It's much more complicated if ACPI is involved because then we have question of whether the ACPI method should have waited. Patch 3/5 mentions "Full context restore or boot latency" language in PCI PM r1.2, but I couldn't find section. I don't know how a generic spec could constrain D0->D3cold or D3cold->D0 transition time because that depends on platform-specific power management hardware. Bjorn
From: Mario Limonciello <mario.limonciello@amd.com> Gary has reported that when a dock is plugged into a system at the same time the autosuspend delay has tripped that the USB4 stack malfunctions. Messages show up like this: ``` thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled ``` Furthermore the USB4 router is non-functional at this point. Those messages happen because the device is still in D3cold at the time that the PCI core handed control back to the USB4 connection manager (thunderbolt). The issue is that it takes time for a device to enter D3cold and do a conventional reset, and then more time for it to exit D3cold. This appears not to be a new problem; previously there were very similar reports from Ryzen XHCI controllers. Quirks were added for those. Furthermore; adding extra logging it's apparent that other PCI devices in the system can take more than 10ms to recover from D3cold as well. This series add a wait into pci_power_up() specifically for D3cold exit and then drops the quirks that were previously used for the Ryzen XHCI controllers. Mario Limonciello (5): PCI: Use an enum for reset type in pci_dev_wait() PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait() PCI: Verify functions currently in D3cold have entered D0 PCI: Allow Ryzen XHCI controllers into D3cold and drop delays PCI: Drop Radeon quirk for Macbook Pro 8.2 drivers/pci/pci-driver.c | 2 +- drivers/pci/pci.c | 70 +++++++++++++++++++++++++++---------- drivers/pci/pci.h | 13 ++++++- drivers/pci/pcie/dpc.c | 2 +- drivers/pci/quirks.c | 25 ------------- drivers/usb/host/xhci-pci.c | 11 ------ 6 files changed, 66 insertions(+), 57 deletions(-)