Message ID | alpine.DEB.2.21.2306111619570.64925@angie.orcam.me.uk (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | pci: Work around ASMedia ASM2824 PCIe link training failures | expand |
Context | Check | Description |
---|---|---|
netdev/series_format | success | Posting correctly formatted |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 49 this patch: 49 |
netdev/cc_maintainers | success | CCed 2 of 2 maintainers |
netdev/build_clang | success | Errors and warnings before: 10 this patch: 10 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/deprecated_api | success | None detected |
netdev/check_selftest | success | No net selftest shell script |
netdev/verify_fixes | success | No Fixes tag |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 49 this patch: 49 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 26 lines checked |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
Sorry to resurrect this one, but I was wondering why the PCI device ID in drivers/pci/quirks.c for the ASMedia ASM2824 isn't checked before forcing the link down to Gen1... We have had to revert this patch during our kernel migration due to it interacting poorly with at least one older Gen3 PLX PCIe switch vendor/generation while using DPC. In another context we have found similar issues during system bringup without DPC while using a more legacy hot-plug model (BIOS defaults for us..). In both contexts our devices are stuck at Gen1 after physical hot-plug/insert, power-cycle. Tried reading through the patch history/review but it was still a little bit unclear to me. Can we add the device ID check as a precondition to forcing link to Gen1?
[+cc Ilpo for his previous involvement here] On Mon, 22 Jul 2024, Matthew W Carlis wrote: > Sorry to resurrect this one, but I was wondering why the > PCI device ID in drivers/pci/quirks.c for the ASMedia ASM2824 > isn't checked before forcing the link down to Gen1... We have > had to revert this patch during our kernel migration due to it > interacting poorly with at least one older Gen3 PLX PCIe switch > vendor/generation while using DPC. In another context we have > found similar issues during system bringup without DPC while > using a more legacy hot-plug model (BIOS defaults for us..). > In both contexts our devices are stuck at Gen1 after physical > hot-plug/insert, power-cycle. Sorry to hear about your problems. However the workaround is supposed to only trigger if the link has already failed negotiation. Could you please be more specific as to the actual scenario where it triggers? A scenario was mentioned earlier on, where a downstream device has been removed from a slot and left behind the LBMS bit set in the corresponding downstream port of the upstream device. It then triggered the workaround where the port was rescanned with the slot still empty, which then left the link capped at 2.5GT/s for a device subsequently inserted. Is it what happens for you? As I recall Ilpo has been working on changes that among others should make sure no stale LBMS bit has been left set, but I'm not sure what the state of affairs has been here. Myself I've been too swamped in the recent months and consequently didn't look into any improvements in this area (and unrelated issues involving the system in question in my remote lab have further impeded me). > Tried reading through the patch history/review but it was still > a little bit unclear to me. Can we add the device ID check as a > precondition to forcing link to Gen1? The main reason is it is believed that it is the downstream device causing the issue, and obviously you can't fetch its ID if you can't negotiate link so as to talk to it in the first place. Maciej
Sorry for belated response. I wasn't really sure when you first asked & I still only have a 'hand wavy' theory here. I think one thing that is getting us in trouble is when we turn the endpoint device on, then off, wait for a little while then turn it back on. It seems that the port here in this case is forced to Gen1 & there is not any path for the kernel to allow it to try another alternative again without an informed user to write the register. I'm still trying to barter for the time to really deeply dive into this so must apologize if this sounds crazy or couldn't be correct. - Matt
On Mon, 22 Jul 2024, Maciej W. Rozycki wrote: > The main reason is it is believed that it is the downstream device > causing the issue, and obviously you can't fetch its ID if you can't > negotiate link so as to talk to it in the first place. Have had some more time to look into this issue. So, I think the problem with this change is that it is quite strict in its assumptions about what it means when a device fails to train, but in an environment where hot-plug is exercised frequently you are essentially bound have something interrupt the link training. In the first case where we caught this problem our test automation was doing some power cycle tortures on our endpoints. If you catch the right timing the link will be forced down to Gen1 forever without some other automation to recover you unless your device is the one single device in the allowlist which had the hardware bug in the first place. I wonder if we can come up with some kind of alternative. - Matt
On Fri, 26 Jul 2024, Matthew W Carlis wrote: > On Mon, 22 Jul 2024, Maciej W. Rozycki wrote: > > > The main reason is it is believed that it is the downstream device > > causing the issue, and obviously you can't fetch its ID if you can't > > negotiate link so as to talk to it in the first place. > > Have had some more time to look into this issue. So, I think the problem > with this change is that it is quite strict in its assumptions about what > it means when a device fails to train, but in an environment where hot-plug > is exercised frequently you are essentially bound have something interrupt > the link training. In the first case where we caught this problem our test > automation was doing some power cycle tortures on our endpoints. If you catch > the right timing the link will be forced down to Gen1 forever without some other > automation to recover you unless your device is the one single device in the > allowlist which had the hardware bug in the first place. > > I wonder if we can come up with some kind of alternative. The most obvious solution is to not leave the speed at Gen1 on failure in Target Speed quirk but to restore the original Target Speed value. The downside with that is if the current retraining interface (function) is used, it adds delay. But the retraining functions could be reworked such that the retraining is only triggered in case the Target Speed quirk fails but we don't wait for its result (which will very likely fail anyway).
On Mon, 29 Jul 2024, Ilpo Järvinen wrote: > > > The main reason is it is believed that it is the downstream device > > > causing the issue, and obviously you can't fetch its ID if you can't > > > negotiate link so as to talk to it in the first place. > > > > Have had some more time to look into this issue. So, I think the problem > > with this change is that it is quite strict in its assumptions about what > > it means when a device fails to train, but in an environment where hot-plug > > is exercised frequently you are essentially bound have something interrupt > > the link training. In the first case where we caught this problem our test > > automation was doing some power cycle tortures on our endpoints. If you catch > > the right timing the link will be forced down to Gen1 forever without some other > > automation to recover you unless your device is the one single device in the > > allowlist which had the hardware bug in the first place. > > > > I wonder if we can come up with some kind of alternative. > > The most obvious solution is to not leave the speed at Gen1 on failure in > Target Speed quirk but to restore the original Target Speed value. The > downside with that is if the current retraining interface (function) is > used, it adds delay. But the retraining functions could be reworked such > that the retraining is only triggered in case the Target Speed quirk > fails but we don't wait for its result (which will very likely fail > anyway). This is what I have also been thinking of. After these many years it took from the inception of this change until it landed upstream I'm not sure anymore what my original idea was behind leaving the link clamped on a retrain failure, but I think it was either not to fiddle with the setting beyond the absolute necessity at hand (which the scenarios such as Matthew's prove wrong) or to leave the setting in a hope that training will eventually have succeeded (but it seems to make little sense as there'll be nothing there to actually observe the success unless the bus gets rescanned for another reason). I'll be at my lab towards the end of the week with a maintenance visit, so I'll allocate some time to fiddle with this issue on that occasion and implement such an update. Maciej
On Mon, 29 July 2024, Ilpo Järvinen wrote: > The most obvious solution is to not leave the speed at Gen1 on failure in > Target Speed quirk but to restore the original Target Speed value. The > downside with that is if the current retraining interface (function) is > used, it adds delay. Tends to be that I care less about how long a device is gone & more about how it will behave once it reappears. For our purposes we don't even tend to notice a few seconds of wiggle in this area, but we do notice impact if the kernel creates the nvme device & it is degraded in some way. Even though we might have automation to recover the device we will have lost more time already than by the purposed delay afaik. Some of the time a human would have hot-insert'ed a new device, but much of the time perhaps the device will be coming back from downstream port containment where there won't be a person to ensure the correctness of link speed/width. In the DPC case perhaps the endpoint itself will have reset/rebooted/crashed where you already suffer a few hundred ms of delay from EP's boot time. I would be interested to know what kind of maximum delay we would all be willing to tolerate & what applications might care. On Mon, 29 Jul 2024, Maciej W. Rozycki wrote: > After these many years it took from the inception of this change until it > landed upstream I'm not sure anymore what my original idea was behind > leaving the link clamped A familiar question I have been known to ask myself. - "Why did I do this again?" The scary/funny thing is that there is almost always a reason. I do think there might be some benefit to overall system stability to have some kind of damping on link retraining rate because I have also seen device stuck in an infinite cycle of many retrains per second, but each time we come through the hot-insert code path kernel should let the link partners try to get to their maximum speeds because it could in theory be a totally new EP. In the handful I have seen there was some kind of defect with a particular device & replacement resolved it. - Matt
Index: linux-macro/drivers/pci/pci.c =================================================================== --- linux-macro.orig/drivers/pci/pci.c +++ linux-macro/drivers/pci/pci.c @@ -4912,6 +4912,8 @@ static bool pcie_wait_for_link_delay(str if (active) msleep(20); ret = pcie_wait_for_link_status(pdev, false, active); + if (active && !ret) + ret = pcie_failed_link_retrain(pdev); if (active && ret) msleep(delay); Index: linux-macro/drivers/pci/pci.h =================================================================== --- linux-macro.orig/drivers/pci/pci.h +++ linux-macro/drivers/pci/pci.h @@ -554,6 +554,10 @@ static inline int pci_dev_specific_disab return -ENOTTY; } #endif +static inline bool pcie_failed_link_retrain(struct pci_dev *dev) +{ + return false; +} /* PCI error reporting and recovery */ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, Index: linux-macro/drivers/pci/probe.c =================================================================== --- linux-macro.orig/drivers/pci/probe.c +++ linux-macro/drivers/pci/probe.c @@ -2549,6 +2549,8 @@ void pci_device_add(struct pci_dev *dev, dma_set_max_seg_size(&dev->dev, 65536); dma_set_seg_boundary(&dev->dev, 0xffffffff); + pcie_failed_link_retrain(dev); + /* Fix up broken headers */ pci_fixup_device(pci_fixup_header, dev);
This now fails unconditionally and will be always optimised away, but provides for quirks to implement recovery for failed links detected in device probing and device hot plug events. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> --- New change in v9, factored out from 7/7: - Rename `pcie_downstream_link_retrain' to `pcie_failed_link_retrain'. - Add stub implementation in "pci.h". --- drivers/pci/pci.c | 2 ++ drivers/pci/pci.h | 4 ++++ drivers/pci/probe.c | 2 ++ 3 files changed, 8 insertions(+) linux-pcie-failed-link-retrain.diff